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ABSTRACT 


We  assume  that  p  random  variables,  ,  are  distributed 

according  to  some  multivariate  normal  distribution  (called  the  p 
variate  normal).  Methods  of  predicting  the  value  of  one,  say,  yp, 
given  the  values  of  the  other  p-l  variables  are  discussed,  A  study 
is  made  of  the  problems  encountered  whenever  one  tries  to  reduce  the 
number  of  variables  used  to  predict  yp  and  at  the  same  time  minimize 
loss  in  prediction  accuracy.  Modifications  of  the  step-wise  proce¬ 
dure  of  adding  predictor  variables  one  at  a  time  are  considered  in 
some  detai  I,  and  methods  of  using  an  automatic  high  speed  electronic 
computer  to  perform  the  numerous  calculations  involved  are  described. 
A  high  speed  computer  program  was  written  to  generate  samples  from 
any  specified  p  variate  normal. 

I  wish  to  express  my  sincere  gratitude  to  Professor  Jack  R. 
Borsting,  who  in  class  introduced  me  to  many  of  the  mathematical 
concepts  used  in  this  paper,  and  as  faculty  adviser  provided  the 
guidance  necessary  to  apply  these  concepts;  and  to  Mrs.  Bette  Joe, 
for  her  most  capable  typing  of  this  paper. 
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Chapter  I 
INTRODUCTION 


The  multivariate  normal  distribution  with  p  variables,  referred 
to  here  as  "the  p  variate  normal"  has  been  found  to  be  useful  as  a 
model  for  a  wide  variety  of  real  world  phenomena.  This  distribu¬ 
tion  has  been  studied  intensely  in  the  literature  and  has  many  "nice" 
mathematical  properties. 

One  of  the  p  variate  normal's  most  useful  properties  is  the 
fact  that  when  q  of  the  variables  are  fixed,  the  remaining  p-q 
variables  become  a  p-q  variate  normal,  which  has  the  same  variance- 
covariance  matrix  regardless  of  the  actual  fixed  values  of  the  first 
q  variables.  Where  q  equals  p-l  the  variable  whose  value  is  not 
fixed,  say  yp,  becomes  a  conditional  normal  random  variable  whose 
variance  is  less  than  the  variance  of  y„  when  the  variables 
y I , * . . ,yp  j  are  not  fixed. 

In  chapter  II,  methods  of  "predicting"  yp  from  known  fixed 
values  of  the  other  p-l  variables  are  described,  and  methods  of 
measuring  the  accuracy  of  prediction  in  terms  of  variance  of  yp 
are  given.  These  methods  requi re  that  the  p  variate  normal  be 
specified  completely  by  a  mean  vector,  U,  and  a  variance-covariance 
(V-C)  matrix, ,  In  chapter  III,  methods  of  approximating  the 
work  of  chapter  II  using  sample  estimates  of  U  and  are  described. 
These  ideas  are  illustrated  by  an  example  in  chapter  IV, 

After  mastering  the  technique  of  regressing  p-l  variables  to 
form  a  prediction  equation  for  the  last  one,  yp,  we  turn  to  the 
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problem  of  eliminating  variables  that  may  not  be  useful  in  predicting 
the  value  of  yp.  Variables  are  eliminated  by  removing  all  reference 
to  them  before  the  prediction  equation  is  computed.  Reasons  for  re¬ 
ducing  the  number  of  variables  in  regression  are  presented  in  chapter 
V.  Later  in  chapter  V,  the  process  of  eliminating  variables  from 
regression  is  illustrated  by  an  example  using  a  specified  five*  variate 
no  rma I • 

At  present,  the  only  known  way  to  find  the  "optimum  set"  of 
r  (rf  p-l)  variables  is  to  compute  all  regressions.  Obviously 

this  involves  extremely  large  numbers  of  computations  for  large 
p,  so  that  methods  involving  fewer  computations  are  normally  used0 
Generally  these  faster  methods  produce  "good"  combinations  of  vari- 
ables  in  regression  but  often  they  are  not  the  "optimum"  combination 
for  the  same  number  of  variables  in  regression. 

Chapters  VI  through  IX  discuss  methods  of  searching  for  a 
satisfactorily  small  set  of  variables  in  regression  that  will  reduce 
the  conditional  variance  of  yp  to  a  satisfactory  level.  The  step¬ 
wise  procedure,  described  in  chapter  VI,  provides  the  basic  proce¬ 
dure  under  study  throughout  the  rest  of  the  paper.  Basically,  this 
procedure  consists  of  adding  variables  to  regression  in  steps.  At 
each  step,  the  variable  to  be  added  is  selected  because  its  contri¬ 
bution  to  variance  reduction  is  greatest  at  this  step.  That  this 
procedure  does  not  always  produce  optimal  combinations  of  variables 
in  regression  is  demonstrated. 

Also  in  chapter  VI  a  statistical  test  to  be  applied  at  each 
step  when  a  sample  is  being  studied  is  described.  This  test  provides 
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a  criterion  for  halting  the  step-wise  process  which  is  a  function 
of  sample  size,  n. 

In  chapter  VI  I  automatic  regression  analysis  performance  by  a 
high  speed  digital  computer  is  discussed.  Additional  halting 
criteria  and  other  improvements  to  the  step-wise  procedure  are 
suggested.  Halt  criteria  proposed  by  Miller  ^7]  and  Efroymson 
are  reviewed  in  light  of  automatic  regression  analysis  requirements, 
A  modification  to  the  step-wise  procedure  reflecting  differences  in 
cost  of  observation  of  variables  is  considered. 

In  chapter  VI II  computer  programs  MV  REGRESSION  and  MV  SIM, 
written  by  the  author,  are  presented.  Basically,  MV  SIM  generates 
samples  of  a  specified  size,  n,  from  a  given  p  variate  normal  with 
which  MV  REGRESSION  performs  regression  analyses,  MV  SIM  also 
computes  regression  parameters  of  the  given  p  variate  normal,  the 
results  of  which  may  be  used  as  standard  for  comparison  purposes 
with  results  of  regression  analysis  of  the  samples. 

In  chapter  IX  current  and  proposed  studies  using  these  high 
speed  computer  programs  are  outlined. 

Appendix  A  describes  the  operation  of  program  MV  SIM  in  detail 
and  some  background  on  the  techniques  used  by  MV  SIM  to  generate 
samples  from  specified  p  variate  normals. 

Appendix  B  describes  statistical  tests  performed  by  MV  SIM 
on  sample  vectors,  Z,  and  sample  (V-C)  matrices,  S,  Results  of 
tests  performed  on  a  number  of  generated  samples  of  different  sizes 
of  a  five  variate  normal  and  an  18  variate  normal  are  given. 
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Chapter  I  I 

THE  P  VARIATE  NORMAL  DISTRIBUTION 


In  this  chapter  we  introduce  the  multivariate  normal  dostribu 
tion  with  p  variables,  hereinafter  called  the  "p  variate  normal"0 
The  basic  theory  associated  with  the  p  variate  normal  is  given  in 
detail  by  Graybi  I  I  £lj.j  and  Anderson  £lj.  Certain  theorems  and 
formulas  that  are  important  for  later  work  on  regression  analysis 
are  given  here. 

A  p  variate  normal  is  completely  defined  by  any  specified 
pxl  vector  of  means,  U,  and  any  pxp  positive  definite  symmetric 
variance  -  covariance  (V-C)  matrix,  £  .  Lets 


w 

q  ' 


0, 


fV|) 

'ull 

„  1 

Y  = 

tpl 

U  = 

# 

.  z-  ( 

4  * 

^pl  '  "^pp1  ' 


The  joint  density  function  of  the  p  variate  normal,  Y,  is  given 
by: 


(2.1)  f(y(,...,yj  - 


|/2(Y-U)T  £  ‘(Y-U) 


for  -  co  -s  yj  «  oo  ,  i  =  l,...,p. 

The  element  O']  j  of  £  is  the  covariance  between  variables  yj 
and  y  .,  and  u.  of  U  is  the  mean  of  y.. 

J 

If  the  pxl  vector  Y  is  partitioned  into  two  subvectors  such 


that: 


k 


Yl 


\Y  2/ 


,  (vectors  Yj  and  are  (p-q)xl  and  qxl  respective! 


<T  -=  P), 
and  If: 


^2 


and 


Eh  E12' 

*2,  ^22' 


are  the  corresponding  partitions  of  U  and  £,  then  it  can  be  shown, 
[/+]  section  3*6,  that  the  conditional  distribution  of  the  qxl  vector 
Y^  given  the  vector  Y|  «  Y|*  (a  constant  vector),  Y^IYj*,  is  .the 
multivariate  normal  distribution  with  qxl  mean  vector 


-I 


+ 


^21  E|  1  “  *■*! ) » 


and  qxq  V-C  matrix 


I  v  I 


-I 


^2  "  ^-21  ^11  ^-12* 


From  the  latter  matrix  we  see  the  important  fact  that  the  co- 
variance  matrix  of  the  conditional  random  vector  Y^IYj*  does  hot  de» 
pend  upon  the  value  of  Yj*. 

We  shall  represent  the  qxq  V-C  matrix  of  Y^IYj*  ass 


(2.2) 


.-I 


'22.1 


Z22  “  £21  Eli  E12 


In  particular,  each  element.  O’..  .  „  .  of  this  matrix 

(i  ®  p-q+l,...,p)  is  the  conditional  variance  of  variable  y.  in  Y^, 

i  .e.  the  variance  of  y.  when  the  p-<j  variables  in  Yj  are  fixed. 

The  element  ^7.  in  the  specified  V-C  matrix,  £  ,  is  the  variance 
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of  yj  in  the  original  p  variate  normal  distribution,,  That  OT .  is 
greater  than  or  equal  to  Ojj  |  p_^  follows  from  formula  2.2 

above,  and  the  fact  that  Yj£\  E||  E|2  's  Pos'^*ve  definite.  In 
fact,  the  following  relationship  holds  (where  0  2  Rj  f  l)s 


°Ti.i,...,P-q  "  (UR?)  °~U 


In  this  formula  R.  is  the  multiple  correlation  coefficient  between 
variable  y.  and  vector  Ygj  see  section  3«6® 

In  this  paper  we  wi  I  I  consider  only  the  case  where  qi  =  I, 

/Y,\ 

Now  Y  “  I  I,  where  Y  is  still  px  I ,  Y^  is  (p-l)xl,  and  Y^  is  the 

*  2/  ti 

fY|  \  V 

variable  y  .  Similarly,  we  partition  Y*  -I  )»  anc*  ^  so  that 

P  \Y2  / 

elements  Y^,  U2,  and  £22  becorrie  Yp»  up»  and  0"pp  respectively. 

It  follows  from  earlier  discussions  that  the  distribution  of 

yp I Y | *  is  the  univariate  normal  distribution  with  (scalar)  means 

(2.3) 


I 


p.  I  ,...,p~ I 


U2  +  ^-21  E|  I  (Yl  "  Ul^ 


-I 


%  +  I2.  Em  -  ui>* 


and  scalar  variance: 

(2.4)  £pp.|,...,p-l  =  E22  “  E2|  E||  E|2 


-I 


r  l 


£pp  “  E2|  E|  |  E|2 
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n  H}  \  ^  r”'  T 

Let jj  be  the  (p-l)xl  vector  jj  J  =  (  ^21  2-||)  *  From 


formula  2.3 ,  we  can  write: 

(2.5)  y  Iy,*  -  u  +  /?Ty,*  -  U,)  +  e 


P-l 


"  up  +  A  (Vi*  -  ui)  + 


p-l  p-l 

Up  -  Z  A  U|  +  Z  A  Vi*  +  e. 


i  =  1 


i  =  l 


where  U|  i  s  sti  1 1 


“I 


\up-i 


>  Y, 


,  and  e  is  a  normally  distri¬ 


buted  random  variable  with  mean  zero.  The  variance  of  e  is 

fr  ,  .,  the  value  of  which  is  independent  of  the  actual  values 

^/pp.l,...,p-l  r 


r  if  ♦ 

°  f  Y|  » •  •  • » y p_ j 


We  define  formula  2.5  as  the  prediction  equation  for  associated 
with  the  p  variate  normal.  Most  often  we  shall  use  it  in  the  form: 

p-l 

(2.6)  ypl Y,*  -  e  -  E(yp|Y1*)  -  up  +  Z  A’^i*  -  u.). 

Now,  if  we  know  the  fixed  values  of  Y|*  (in  addition  to  U  and  J ) 
we  can  use  2.6  to  compute  the  mean  of  the  conditional  random 
variable  y p | Y  j  * •  A  measure  of  the  rterrorw  involved  in  using  the 
results  of  2.6  to  "predict"  the  value  of  y  ,  when  Y(*  is  known,  is 
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given  by  Q ~  j  ^  j .  By  comparison,  if  the  value  of  Y^*  is  not 

known,  one  might  use  the  original  mean  of  yp,  Up,  to  "predict"  the 

value  of  yp.  The  corresponding  "error"  of  this  prediction  is  given 

by  CZ„,  which  is  greater  than  i  „  ..  The  values  of  the 

PP  pp.i,...,p-i 

scalars,  J3]»  in  vector  J3  are  called  partial  regression  coefficients, 
Suppose  the  computed  values  of  some  of  the  partial  regression 
coefficients  J3j »  etc...  are  zero,  or  close  to  zero.  Then, 

obviously,  insofar  as  estimating  yp  is  concerned,  one  can  save  the 
effort  and  cost  of  observing  the  values  of  yj,  y^. 

It  often  happens,  especially  when  the  number  of  variables, 
p,  is  large,  that  some  of  the  variables,  themselves,  can  be  pre¬ 
dicted  rather  accurately  by  a  linear  combination  of  other  variables. 
This  shows  that  even  if  none  of  the  partial  regression  coefficients 
are  close  to  zero,  it  may  be  possible  to  observe  only  a  select  few 
of  the  variables  and  still  predict  yp  nearly  as  accurately  as  when 
all  of  the  variables  are  used. 

Of  course,  the  values  of  the  partial  regression  coefficients 
to  be  used  with  each  variable  depend  upon  which  other  variables 
are  used  in  combination  to  predict  yp.  Throughout  this  paper,  any 
combination  of  the  original  p-l  variables  that  are  used  to  predict 
yp  in  the  manner  just  described  will  be  said  to  be  "in  regression". 
The  variables  whose  values  are  not  to  be  used  to  predict  yp  we  shall 
say  are  "not  in  regression". 

Once  a  combination  of  variables  to  be  i n  regression  have  been 

/  r*  / 

chosen,  a  modified  mean  vector  U  and  V-C  matrix  ^  are  formed  from 
the  original  U  and  £  respectively  by  removing  the  u.  from  U  and 

J 
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G~’  •  and  O T.  (for  a  I  t  i  and  k)  from  for  each  variable  y?  that 
1 J  JK  J 

is  to  be  "not  in  regression",  (if  q  of  the  p-l  variables 
y, j  are  to  be  "not  in  regression",  then  U/  is  (p-q)xl  and 

Z/ 

is  (p-q)x(p-q)).  Thus,  we  see  that  all  reference  to  those 
variables  not  ?n  regression  is  completely  removed  and  a  new  p-q 
variate  normal  is  defined  by  U/  and  L,  ,  from  which  new  prediction 
equations  (2.5  or  2.6)  can  be  computed.  Note  that  it  is  possible 

p-l 

to  make  up  ^  (p"j*)  prediction  equations  for  predicting  variable 

j=l 

yp,  one  for  each  possible  combination  of  variables  y  1 9 • • • # I  * 

In  chapters  which  follow  we  wi I  I  discuss  methods  of  estimating 
the  partial  regression  coef f icients,  JJ-.,  and  O’  t  >  etc., 

when  the  values  of  U  and  £  are  not  known.  Methods  of  choosing 
which  variables  are  "best"  to  use  in  regression  will  be  discussed. 
We  shall  also  consider  the  problem  of  specifying  relative  "cost" 
of  observation  per  unit  reduction  in  Q~  i  . 

K  ^pp»i,...»q 


9 


Chapter  I  I  I 

STATISTICAL  ANALYSIS  OF  THE  P  VARIATE  NORMAL 


In  this  chapter  we  assume  that  Y  has  a  p-variate  normal  distri¬ 
bution  with  unknown  mean  vector  U  and  V-C  matrix  We  are  now 

concerned  with  methods  by  which  an  experimenter  can  estimate  U  and 
£,  and  subsequently,  other  parameters,  such  as  regression  coeffi¬ 
cients  for  prediction  equations  for  predicting  ypj  and  \  f  0 o o sq» 

the  conditional  variance  of  ypl^*  when  variables  y|,...,y  are  in 
regression.  In  order  to  distinguish  estimates  of  parameters  from 
their  associated  theoretical  values,  it  is  convenient  to  develop  new 
notation  to  be  used  throughout  this  paper,  listed  here  for  easy 
reference: 


Notation  for 
Theoretical  Values 


TABLE  I 


Meaning  of  Parameter 


Notation  for 
Associated  Estimated 
Parameters 


U 


pxl  mean  vector  of  the  p 
variate  normal 


Z 


pxp  V-C  matrix  of  the  p 
variate  normal 


J3  (beta) 


qxl  vector  of  regression 
coefficients  associated  with 
q  vectors  in  regression 


S 

B 


O*  The  element  in  row  p,  column  s^ 

PP  PP 

p  of  2-  >  which  is  the  (uncon¬ 
ditional  )  variance  of  yp,  yp 

is  arbitrarily  chosen  to  be 
the  variable  to  be  predicted. 


°Fp.i»...,q 


The  conditional  variance  of 
yp|Y|*,  where  Y|*  is  a  qxl 
vector  of  fixed  values  of 


PP®  I » ® • o ,q 


y  I » •  •  i  »Yq 
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A  sample  of  size  n  can  be  arranged  into  nxp  matrix  form 


as  fo I  lows : 


yir  yi2'  •••*  yip 


y2 1  *  y22*  0**,  y 


j  | 


ynT  yn2 


2p 


np 


where  y  ..  represents  the  j  th  observation  of  variable  y*.  Note 

that  for  this  sample,  observations  of  yp,  the  variable  later  to 

/ 

be  predicted,  are  also  required.  Sample  means  are  computed  ass 

n 

"  ii 

yi  =  JL_ _  ,  for  i  =  I,  2,  ...,  p< 


Z  yji 

J-l 


Sample  covariances  as: 


!ik  = 


Z  (Vji  -  7|! 

J 


)  (yjk  “  yk) 


n  -  I 

for  i,  k  =  I,  ...,  p. 

For  i  *  k  the  sample  covariances  become  the  sample  variances! 


sii  = 


z 

j-l 


f 

(y  ji  -  y i )‘ 


n  -  i 


By  analogy  to  the  mean  vector  U  and  V-C  matrix,  J]  ,  we  form 
the  pxl  sample  mean  vector,  Z,  and  sample  V-C  matr? x,  S,  as  fo I  lows 


II 


It  can  be  verified  easily  that  Z  and  S  are  unbiased  estimates 


of  U  and  51  respectively;  and  that  Z  and  £(n-l)/nj»S  are  maximum 
I  ike  I ihood  estimates  of  U  and  51 . 

To  develop  estimates  of  the  parameters  of  the  conditional  dis¬ 
tribution  of  y p I Y | *  we  recall  that  the  random  variable  yp|Y|*  is 
normally  distributed  with  mean  and  variance  given  by  equations  2»3 
and  2.i+.  We  partition  Y,  Y*,  Z,  S,  as  we  did  Y,  Y*,  U,  and  51  » 
respectively  in  Chapter  II: 


where,  as  before: 


:'l  V  - 

Yl*\ 

e 

* 

4  * 

1 

Cl 

>- 

iyp-i/ 

(constant  vector) 


and. 


» •  •  s 


I  p-l 


k  #  •  #  s 
I  p- I  p- I 


>  SI2  =  (S2|) 


>1  P 


Since  z:  and  (n~  * )  •  s?  •  are  maximum  likelihood  estimates  of  u. 

1  n  1 J  i 

and  Cjj  respectively,  for  i,  j  =  I,  p,  it  follows  from  the  in¬ 

variant  property  of  maximum  likelihood  estimates  that: 

2p  +  S2I  Sli  (Y*  “  z|)»  and  [spp  "  S2I  ^11  S 1 2 1 ^ 

a  re  maximum  I  ike  I  ihood  estimates  of  u  +  Y.  2\  51 1  |  (Y*  -  Uj ) ,  and 

-\ 

£pp  -  2-21  ^-||  “12  respectively. 
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%  ' 


Let: 

(3.0 

zp.l,.. 

.,p-l  "  zp  +  S2I  Sll  ^Yl  “  Z0 

and. 

(3.2) 

spp.l,. 

-1 

..,p-l  “  spp  ”  S2I  Sl 1  SI2  • 

It  can  be  shown  that  z_  i  _  .  and  sor,  ■  „  .  are 

P« I | • • • | p* I  pp • I | | p* I 

unbiased  estimates  of  up< j ^ # ..^p- 1 »  and  <^pp.  I,...,p-I»  *he  mean 
and  variance  of  the  conditional  random  variable  yp|Yj*  respectively. 


Simi larly,  i f  we  let  B 


bl 


-UT 


,  '  ^  |  -  (S2 1  Sl |)  »  8  5 


is  a  maximum 


I  jke I  i  hood  esti  ma tor  of  J3  «  (Y,  gj  Y,  |  | )  i  that  Is,  b  j  is  a  M.L.E.  of 
J3 j  for  I  =  I,.. ,,p-l.  It  can  be  shown  that  B  is  also  an  unbiased 
estimator  of  J3 • 

We  can  write  B  in  the  form  (since  SjJ  is  positive  definite): 

(3.3)  B  =  sjJ  S|2J 


.bl 


ip : 


sl  I  •••  sl  p-l 

0  '  r 

#  ' 

t  0 

’p-l /  \sp-l  l*“sp-l  p-l  /  \sp-l  p 
Equations  3*3  are  called  the  normal  equations. 

Substituting  zp#,j##<#p_,  for  up# , ^ ,  we  obtain  an  un¬ 
biased  esthnate^  for  the  value  of  the  prediction  equation,  2.6,  by: 

(3.4)  rTv^ivT)  »  z. 


*p#  I  J  •  •  •  I 

zp  +  S2 I  Sl !  (Yl  ■  2|) 

p-i 

zp  +  bi  (y*  “  zi}* 
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Chapter  IV 
AN  EXAMPLE 


Assume  that  an  experimenter  wishes  to  gather  data  from  some 
process  involving  five  variables,  which  he  assumes  to  be  related 
according  to  a  five  variate  normal  distribution.  Suppose  this  five 
variate  normal  actually  is  defined  (completely)  by  the  following 


theoretical  vector,  U,  and  V-C  matrix 


,1. 


I 


(if.  0 


U  -< 


f  7.1+600  ^ 

'U'l 

1+8.1500 

U2 

• 

--0 

0 

0 

M 

u3 

30.0000 

ul+ 

k,  95.1+200  ; 

<Uq  / 

- 

r\ 


(1+.2) 

< 

•  7 

31+.6025 

20.9233 

-  31.0517 

-  2I+.I667 

6I+.6633 

20.9233 

2 1+2.  11+08 

-  13.8783 

-  253.1+167 

191.0792 

i  -  ^ 

-  31.0517 

-  13.8783 

1+1 .0258 

3.1667 

-  51.5192 

<0 

-  2I+.I667 

-  253.1+167 

3.1667 

280. 1667 

-  206.8083 

61+.6633 

191.0792 

-  51.5192 

-  206.8083 

226.3133 

Using  developments  of  chapter  II,  we  let  Y 


I.  The  value  of  U  and  £  used  here  were  computed  as  sample  vector  Z. 
and  V-C  matrix  S  using  data  from  table  20.1+,  page  61+7  of  HALD  [hJ. 
The  results  of  tables  20,5  and  20,6  of  Ha  Id  were  used  to  verify  the 
results  of  computer  program  MVSIM,  which  performed  most  of  the 
computations  required  for  this  paper. 


We  know  that  we  are  going  to  be  given  values  of  yj,  yg,  y^,  y^, 
from  which  we  will  predict  y^.  Hence,  we  must  set  up  the  prediction 
equation  for  y^.  (equation  2.6).  Accordingly,  we  partition  U  and  X 


as: 


7.4600' 

< 

48.1500 

> 

1 1 .7700 

s 

U. 

.  30.0000- 

(95.^00) 

A 

-  34.6025  20.9233  -  31.0517  -  24.1667' 

64.6633 ' 

20.9233  2i*2.l408  -  13.8783  -  253.4167 

>  < 

191.0792 

> 

-31.0517  -  13.8783  41.0258  3.1667 

-  51.5192 

.-24.1667  -253.4167  3.1667  280.1667^ 

1  -  206.8083j 

(  64.6633  191.0792  -  51.5192  -  206.8083) 

(  226.3133) 

For  J2 ,  we  get 


Z||  Z|2 
‘Z2|  Z22 


/me8.  o 


'  1.5513' 

rA] 

.5103 

< 

>  =< 

A. 

.1021 

A 

-  .i438J 

a) 

The  prediction  equation  for  y^  becomes: 
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1*  U 

(b.3)  e  (y 5I y , *)  -  ^  -  Y,  Aui  +  Z  P iVj* 


i  =  l 


i  —  I 


95.1*200  -  (1.5513  .5103  .1021  -  .11*38)  •< 


7.1*600'! 
llB.1500 
1 1 .7700 
130.0000 


v  * 


+  (1.5513  .5103  .1021  -  .  ll*38 W  2  \y 


V 

v' 


or, 

(l*.l*) 


E  (y  |y*)  =  62.3881  +  1.5513  y,*  +  .5103  y*  +  .1021  y*  -  .11*38 

The  variance  of  y^|V|*,  C55  I  2  3  1*  *  (the  conditional  variance 
of  y5  given  y,,  y2,  y y  y^,),  is: 


-I 


^55.1,2,3,1*  =  ^-22  "  ^21  £|l  ^ 
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=  226.3133  -  222.321*2  =  3.9891 

f 

which  is.„.a  measure^of  the  prediction  error  when  formula  l*.l*  is  used 
to  predict  y^  when  Yj*  is  knpwn.  By  comparison,  i^  the  values  of  Y|* 
were  ignored  and  if,  instead,  the  value  u^  =  E  (y^)  =  95.1*200  was 
always  used  to  estimate  y^,  the  correspond! ng  measure  of  the  predic¬ 
tion  error  would  be  Qz  -  226.3133. 

55 
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as 


Thus,  by  knowing  the  mean  vector  U,  and  the  V-C  matrix  £  , 
given  by  formulas  1|..  I  and  U»2,  we  can  set  up  the  above  prediction 
equation,  i+.U.  Then  for  any  set  of  values  y(,  y^,  y^,  y^,  we  can 
make  an  accurate  prediction  of  y^  without  observing  its  value. 

The  problem  facing  the  experimenter  is  more  complicated  than 
the  one  discussed  in  the  preceding  paragraphs.  This  is  because 
he  does  not  know  the  values  of  mean  vector,  U  and  the  V-C  matrix, 

X!.  All  he  knows  is  that  (by  assumption)  y(,  y^,  y^,  y^,  y^  are 
distributed  according  to  softie  five  variate  normal  distribution,  and, 
therefore,  are  completely  specified  by  some  theoretical  mean  vector 
U  and  V-C  matrix  £  whose  actual  values  he  will  never  know. 

Assume  the  experimenter  draws  a  sample  of  size  500  from  this 
five  variate  normal  distribution  (specified  by  equations  i+o  I  and 
h*2) •  He  then  computes  all  sample  means,  variances,  and  covari¬ 
ances  (7.  =  z. ,  s..,  s.  .  respecti ve ly)' and  forms  the  sample  mean 
I  II  I  J 

vector,  Z,  and  sample  V-C  matrix  S  as  defined  in  chapter  III. 
Suppose,  as  an  example,  he  obtains  the  following  results  upon  draw¬ 
ing  a  sample  of  size  500: 


'  V 

- 

'V 

r  7.7762+ 1 

< 

V 

♦  4-\ 

2+8.7 1 55 

► 

Z3 

a 

h 

11.5687 

V 

1 V 

■  29.3039  J 

ig 

(  96.2+8 16 ) 
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M 


< /> 

00 

ro 

S2I 

S22 

3I4..6I6O  20. 11+95  -  32. 821+8  -  21.^636) 

20.11+95  217.9056  -18.3144.3  -  223.1718 

-  32.821+8  -  I8.3I4I+3  1+2.8371 


7 .1+936 

-  21.5656  -  223.1718  7.1+936  2I42.3209J 


6I4..0139 

171.9229 

-  57.1+059 

-  180.1+153  J 


(  6I+.0139  171.9229  -  57.1*059  -  I8O.I+153 )  (  211.0392) 

To  estimate  J3  by  B  we  compute 


B  =  K,  st!]t-‘ 


I.636O 

.5921 

.1771+ 

-  .0590 


Hence,  the  estimate  of  the  prediction  equation  3*1+  becomest 

1+  1+ 

* 


ftvpV)  -  ^  -  Z  b,  r,  +  L 


i-l 

* 


b.  y . 
1  '  1 


51+.592I  +  I.6360  y,  +  .5921  y*  +  .1771+  y3  -  .0590  y^ 


The  unbiased  estimate  of  the  conditional  variance  of 
would  be: 


y5h 


55 


•1.2,3 ,h  "[ 


s55  S2i  Si 1  s 


l.-2=L 

12 J  n-p-l 


1+99 

I+.O368  x  ■  I+.O69I+ 


Now  the  experimenter  is  in  a  position  to  predict  the  value  of  y 
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For  example,  suppose  he 


given  a  set  of  values  yj*,  y2*»  y^*,  y^*. 
is  given  that  the  yj*  =  u.,  the  true  means  of  the  yj*  (Of  course, 
he  doesn’t  know  that  these  are  true  means). 

Using  the  true  prediction  equation,  we  get  (see  4.3)* 

4  k 

E  (yjv,*  -  U)  -  u c-Ys  Aui  +  ^  AUi  =  95.419^. 

7  7  i  =  I  i  =  l 


The  experimenter  would  estimate  this  value  as: 


54.5921  +  (  I.6360) 

+  (  .5921) 

+  (  .1774) 

+  (  -  .0590) 


(  7.4600) 

(48.1500) 

(11.7700) 

(30.0000)  =  95.6243  . 
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Chapter  V 


REDUCTION  IN  THE  NUMBER  OF  VARIABLES  IN  REGRESSION  -  INTRODUCTION 


Experience  has  shown  that  when  the  number  of  variables,  p,  is 
large,  say  over  20,  usually  a  relatively  smal  I  number  of  variables,, 
can  be  found  to  use  in  regression  to  predict  y^  nearly  as  accurately 
as  when  all  p-l  variables  are  used  in  regression^  *  page  20 « 
Finding  such  a  small  combination  of  variables  is  desirable  fora 
number  of  reasons: 

l)  The  prediction  equation,  has  fewer  terms;  thus  it  is 


easier  to  compute  a  predicted  value  of  y^, 

2)  Fewer  variables  need  to  be  observed  in  order  to  make 


a  prediction  of  yp.  Presumably  this  would  result  in 
reducing  the  cost  of  observing  variables  for  each  pre¬ 
diction  of  yp, 

3)  When  p  is  large,  the  prediction  equation  involving  p-l 
variables  requires  many  computations.  Step-wise  pro¬ 
cedures,  described  later,  when  yielding  a  relatively 
small  number  of  variables  in  regression  produce  a  pre¬ 
diction  equation  with  much  less  effort. 

U)  When  the  regression  is  being  performed  on  a  sample, 
variables  that  do  not  contribute  much  variance  reduc¬ 
tion  of  yp  can  actually  cause  the  prediction  equation 
to  yield  a  worse  fit  to  the  underlying  (specified)  p 


variate  normal  than  would  result  if  they  were  omitted 
from  regression.  The  reason  is  that  the  longer  equation 


-7* 


20 


can  overfit  the  sample  and  ascribe  some  of  the  variation 
due  to  small  scale  random  fluctuations  to  one  of  the  ore- 


dictors  "by  accident”. 

As  one  would  suspect,  whenever  a  single  variable,  y^,  is  added 
to  regression  the  old  conditional  variance,  say,  .  is 

a Iways  greater  than  or  equal  to  the  new  conditional  variance, 
i  r.»  However,  usually  the  amount  by  which  Q~  , 

is  reduced  becomes  small  as  the  number  of  variables  in  regression 
increases,  even  though  optimal  combinations  for  each  number  of 
variables  in  regression  are  used.  To  illustrate  this  idea,  let  us 
consider  an  example  of  a  regression  problem  under  ideal  conditions. 
That  is,  we  shall  examine  a  p  variate  normal  specified  in  terms  of 
vector  U  and  matrix  Y  • 

We  first  compute  the  prediction  equation  for  yp,  and  the 

associated  conditional  variance  Q-n  .  „,  for  each  possible 

pp. i , . • o ,q 

p-l 

combination  of  variables  y ( . *Yp_ |  'n  regression  (f5”1) 

j«l 

sets  of  prediction  equations  to  solve).  We  shall  then  group  the 

results  according  to  number  of  variables  in  regression,  and  from 
each  group  pick  the  "optimal”  combination  of  variables  in  regres¬ 
sion}:  that  is,  the  combination  of  variables,  say,  yj,...,y^,  in 

regression  producing  the  smallest  OZn  ,  „•  * 

3  K  3  pp.l,...»q 


I.  Clarification  of  notations  The  reader  should  understand  that 
whenever  a  "combination  of  variables  in  regression,  say,  yj,..«,y  w 

and  the  associated  conditional  variance,  ” sy~  ,  ,  is  discussed 

<-'pp.l,...,q'* 

as  in  the  preceding  paragraph,  the  q  variables  in  regression  are  not 
necessarily  meant  to  be  regarded  as  the  first  q' variables  as  defined 
by  position  in  the  original  vector  U  and  matrix  In  other  words, 

in  order  to  ease  notational  difficulty,  variables  in  regression  are 
tempo rari  ly  relabeled  y(,...,y  . 
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With  this  grouping,  we  can  now  start  with  one  variable  in 
regression  and  add  to  the  number  of  variables  in  regression  one  at 
a  time,  each  time  choosing  the  "optimal"  combination  of  variables 
for  that  group,  until  we  decide  that  adding  more  variables  to  regres¬ 
sion  will  not  reduce  Q~  ,  enouqh  to  make  it  worthwhile, 

vpp.l,...,q  y 

In  our  example,  we  shall  use  the  five  variate  normal  as  defined 
by  If.*  I  and  i+,2.  To  compute  the  prediction  equation  using  only  yj 
in  regression,  the  prediction  equation  becomes: 


E  (y5lYl*)  =  u5  -  A  u|  +  P\  v* 

'  -I  T 

-here  /},  -  /}-[Z2l  I,,]  =  ^33  •  -  1.8687, 

and  C55-l  "  £-22  "  ^-21  £|l  T|2 


«  226.3133 


6JL4..6633  X  6k» 6633 

3l|..6025 


l05.i+739  . 


Similarly,  we  compute  partial  regression  coefficients 
for  all  15  possible  combinations  of  the  variables  y^  y^, 
in  regression.  Table  II  shows  the  results. 


*  fil* 

y3*  yb 
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Table  II 


Variables 

in 

Reg ression 

q 

Partial  Regress 

A  A 

ion  Coefficients 

A  A 

Associ ated 
Condi tiona 1 
Vari  ance 

^55°  1 » . . » »q 

yl 

1 

1.868? 

105.1+739 

y2 

1 

.7891 

75.5280 

y3 

1 

-  1.2557 

161.6167 

* 

yl+ 

1 

- 

.7381 

73.6553  <=T 

*y,  y2 

2 

l.i+683 

.6622 

1+.826I  ' 

*1 

■> 

2 

2.3125 

.1+91+5 

102.2551 

Ch 

2 

1.1+399 

«=» 

.6139 

6.2303^- 

y2 

y3 

2 

.7313 

-  1.0083 

31+.6208 

y2 

yl+ 

2 

.3108 

- 

.1+569 

72.1+065 

y3  yl+ 

2 

-  1.1998  - 

.721+5 

I1+.61+1+7 

y,  y2 

y3 

3 

1 .6959 

.6569 

.2500 

1+.0096 

*y,  y2 

Qk) 

3 

1.1+519 

.1+160 

- 

.2365 

3.9982^T" 

(5) 

y2 

y3  yh 

3 

3 

1.0518 

-  .9231+ 

-  .1+100  - 

-  l  .1+1+79  - 

.61+27 

1.5570 

I+.2368 

6.1506 

♦y,  y2 

y3  yl+ 

1+ 

1.5513 

.5103 

.1021 

.11+38 

3.9891 

Note: 

Each  group 

i s  i denti f i ed  by  the 

value  of  q. 

*  Indicates  the  "optimal"  combination  of  variables  for  the  group 
(for  that  number  of  variables  in  regression). 
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We  now  select  the  optimal  combination  from  each  group  of  variables 
in  regression  as  follows  (we  omit  the  partial  regression  coefficients)? 

Table  III 


q 

Variables  in  Regression 

Associated 

Condi tiona 1 

Group  Number 

(Optimal  Combination) 

Variances  of  y,_ 
- - - 

0 

None 

^55 

-  226.3133 

1 

y4 

°55»k 

-  73.6553 

2 

vr  v2 

°55*l,2 

=  4.8261 

3 

V\>  v2»  vi* 

°55- 1,2,4 

=  3.9982 

k 

v\»  v2’  y3»  ^ 

^5-1, 2,3.1.  ■  5-9891 

From  table  III  we  immediately  see  that  most  of  the  reduction  of 
the  conditional  variance  of  can  be  done  by  introducing  only  two 
of  the  possible  four  variables  into  regression,  namely  yj  and  yg. 
Very  little  more  is  accomplished  by  using  the  other  two  variables, 
given  that  yj  and  y^  are  going  to  be  used  in  regression. 

Note  that  the  five  variate  normal  is  easily  handled  by  an  elec- 

h 


tronic  computer  because  only 


15  prediction  equations  had 


to  be  computed,  none  with  more  than  five  variables  involved.  On 
the  other  hand,  the  18  variate  normal,  for  example,  requires 


17 

£('j> 
J-l  J 


131,071  prediction  equations,  most  of  which  involve  many 


variables.  Hence  this  procedure  is  not  always  feasible  even  when 


today’s  high  speed  electronic  computers  are  available. 

It  is  interesting  to  note  that  when  all  four  variables  are  in 


regression  the  values  of  the  regression  coefficients  do  not  suggest 
which  variables  might  be  best  to  eliminate  from  regression.  In  fact* 
none  of  the  values  are  close  enough  to  zero  to  indicate  that  any 
should  be  removed. 

In  this  chapter  it  has  been  shown  that  we  can  expect  the  amount 
of  reduction  in  the  conditional  variance  of  to  be  less  per  variable 
added  to  regression  when  the  number  of  variables  in  the  optimal  com¬ 
bination  becomes  larger.  Thus,  if  one  were  willing  to  state  in  advance 
his  maximum  allowable  value  of  the  conditional  variance  of  yp,  the 
problem  would  be  a  straight  forward  one  of  searching  table  II  for  the 
minimum  number  of  variables  producing  that  conditional  variance  or 
less.  We  now  restate  this  same  problem  in  the  above  termss 

nTo  find  some  satisfactorily  small  number  of  variables, 
q,  (q  :*  p-0»  that,  when  used  to  predict  yp,  reduces,  Cpp,  |  ,,,,,q» 
to  some  satisfactorily  small  fraction  of  the  unconditional  variance 


Chapter  VI 

THE  STEP-WISE  PROCEDURE 

We  now  discuss  an  alternate  procedure  of  searching  for  optimal 
combinations  of  variables  in  regression,  called  the  step-wise  proce¬ 
dure.  This  procedure  has  the  advantage  of  reducing  the  number  of 

p-l 

prediction  equations  to  be  solved  from  ^  (P“*),  as  in  chapter  V, 

r=l 

to  p-l  or  less,  thus  keeping  the  number  of  computations  to  within 
the  capability  of  today's  high  speed  electronic  computers.  We  shall 
see  that  the  combination  of  variables  selected  by  this  method  is  not 
always  optimal,  i.e.,  it  is  possible  that  a  different  set  of  the  same 
number  of  variables  might  yield  a  more  accurate  prediction  equation 
for  y^.  However,  practical  experience  indicates  that  sets  decidedly 
better  than  those  discovered  by  the  procedure  outlined  in  this  chapter 
are  rare  £a|,  page  I9»  We  sha 1 1 • discuss  additional  problems  en¬ 
countered  when  the  step-wise  procedure  is  applied  to  a  sample.  The 
need  for  statistical  tests  at  each  step  is  demonstrated  and  an  actual 
test  is  developed. 

The  step-wise  procedure  is  as  fol lowss  At  each  step  every  vari¬ 
able  not  yet  in  regression  is  examined  to  see  how  much  the  conditional 

variance  of  y  would  be  decreased  if  it,  alone,  were  added  to  the 
rP 

variables  already  in  regression,  i.e.,  assuming  q  variables  are  already 

in  regression,  the  quantity  ry-  .  -  ry-  is  computed 

pp. I ,...,q  qjp. I,...,q,m 

for  each  variable,  ym,  still  not  in  regression.  The  variable  to  be 
added  to  regression  is  y^,  the  variable  for  which  this  computation  is 
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greatest;  i.e.,  y^  is  chosen  from  the  variables  not  in  regression,  y^ 
so  that 


<2pp.l,...,q  ^pp,  I  ,, ,,  ,q,k  m^X  [C'pp.  |  ,.„.,q  ^pp.  I  , .  .  .  ,q,m 
or  equivalently,  such  that: 

^pp •  I ,  •  •  • , q ,k  ^  t ^PP*  I ,  •  •  • , q,m ]* 

We  illustrate  this  procedure  by  applying  it  to  the  p  variate 
normal  specified  by  equations  l*.  I  and  l*. 2.  This  illustration  can 
be  followed  most  easily  if  reference  is  made  to  table  II  of  chap- 
ter  Vt 

Step  I:  Compute  all  four  conditional  variances  of  group  I, 

and  choose  the  smallest  value  (73*6553)* 
action:  add  variable  yj^  to  regression 

results:  variables  in  regression:  y^ 


c? 


55-b 


73.6553 


Step  1 1 : 


Compute  the  conditional  variances  of  group  2  that 
include  variable  y^  in  regression,  and  choose  the 
smallest  value  (6,2503). 
add  variable  y  |  to  regression 
variables  in  regression:  yj,  y^ 

CZr-  .  |  »  6.2305 

55*1.4 

Step  III:  Compute  the  condi tiona I  variances  of  group  3  that 
include  variables  yj  and  y^  in  regression,  and 
choose  the  smallest  value  (3.9982). 


action: 

results: 
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action:  add  variable  yg  to  regression 

results:  variables  in  regression:  y^,  y^ 

°h- 1,2,4  -3*"82 

Step  IV:  Add  the  last  variable 
results:  variables  in  regression:  y^t  y^,  y^»  y^ 

C55* I# 2,3,4  =3.9891 

As  in  the  preceding  chapter,  we  immediatefy  see  that  most  of 
the  conditional  variances  of  y^  can  be  eliminated  by  using  only  two 
of  the  possible  four  variables  in  regression.  However,  this  time 
the  pair  chosen  were  variables  y |  and  y^  instead  of  yj  and  y^j  pro¬ 
ducing  a  conditional  variance  of  6,2303  instead  of  !j.,826l. 

The  step-wise  procedure  is  equally  applicable  to  analysis  of 
a  sample  of  size  n.  In  this  case  all  information  is  obtained  from 
the  sample  vector,  Z,  and  sample  V-C  matrix,  S.  In  particular, 

the  values  of  the  sample  conditional  variances,  s  .  ,  rather 

pp. i,«.«,q 

than  C"  i  are  used  at  each  step  to  determine  the  next  vari- 

'-'pp.l,...,q 

able  to  enter  regression.  As  before,  p-l  prediction  equations, 
and  associated  estimated  conditional  variances  of  y  ,  s  .  , 

can  be  obtained.  Each  succeeding  equation  will  contain  one  more 

variable  in  regression,  and  usually  will  have  a  sma I ler  va lue  of 

I 

s  ,  ,  Now,  as  in  chapter  VI,  the  most  acceptable  combination 

pp, i,«.,,q 

of  variables  in  regression,  for  which  the  estimated  conditional  vari¬ 
ance  of  Yp  is  small  enough,  can  be  chosen, 

I.  The  exception  can  occur  when  the  sample  size,  n,  is  small.  See 
Ha  Id's  example  table  20,6,  where  n  =  13.  M- 
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At  this  point  we  must  consider  a  problem  that  is  ever  present 
whenever  a  sample  is  used  as  a  source  of  information,,  In  the  present 
case  the  problem  is  stated  as  follows;  How  do  we  know  that  the  sample 
size,  n,  was  large  enough,  so  that  the  conditional  variance  associated 
with  the  combination  we  have  just  selected  is  accurate?  (We  will 
always  assume  that  n  is  greater  than  p). 

Intuitively,  if  n  is  just  a  little  larger  than  p  we  should  not 
have  much  confidence  in  sample  vector  Z  and  sample  V-C  matrix  S, 
nor  in  the  estimated  regression  coefficients  or  conditional  variances 
of  yp.  In  fact  we  shouldn’t  be  surprised  if  a  second  sample  of  the 
same  size  were  to  produce  a  comnletely  different  set  of  variables  when 
the  same  step-wise  procedures  are  used.  On  the  other  hand,  as  n 
approaches  infinity  the  samples  Z  and  S  approach  the  true  values  of  U 
and  .  It  is  clear  that  at  each  step,  each  variable  that  is  a  candi¬ 
date  to  enter  regression  should  be  given  a  statistical  test  of  some 
kind. 

Suppose  q  variables,  yj,...,yq,  are  already  in  regression  with 

estimated  conditional  variance  of  y„  given  by  s  j  and  suppose 

P  '  pp.l,...,q 

that  we  are  considering  variable  y^  for  addition  to  regression.  It 

can  be  shown  that  if  actually  Q~  ,  ,  =  Q~  ,  ,  then  the 

1  '~"pp.  I,...,q:,k  '“'pp.  I,...,q* 

statistic 

„  \  spp.|,...,q  -  (n-q-2)  spp. | , ...,q,k 

(o. I)  F  - - 

Spp.l,,..,q,k 


has  the  F  distribution  with 
section  6.1;.  Furthermore, 


I  and  n-q-l  degress  of  freedom  £l|J, 
statistic  F  wi  I  I  tend  to  be  greater  than 
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F  (l,  n-q-l)  if  Q-  is  actually  less  than  Q-  . 

We  immediately  encounter  a  new  complications  The  above  statistic 
F  behaves  as  stated  above  so  long  as  variable  y^  is  studied  by  itselfs 
However,  the  selection  of  y^  from  among  those  variables  still  not  in 
regression  was  not  completely  at  random,  y^  was  chosen  at  this  time 
because  it  was  estimated  to  be  the  "best"  variable  to  add  at  this 
step.  In  other  words,  we  are  in  effect  computing  F  for  a  number  of 
variables  and  choosing  the  variable  for  which  F  is  the  largest.  It 
is  important  to  realize  that  due  to  this  method  of  selection,  the  F 
statistic  used  with  the  selected  variable  y^  will  tend  to  be  larger 
than  would  be  expected  on  the  average  if  variable  y^  were  to  be  studied 
as  an  individual  variable  alone.  Intuitively,  this  effect  should  be 
stronger  with  the  first  variables  added  to  regression,  since  those 
variables  for  which  F  is  large  due  to  randomness,  are  removed  from 
those  not  in  regression  early.  Suggested  procedures  for  compensating 
for  this  are  discussed  in  a  later  chapter. 

Let  Q(  be  the  probability  of  erroneously  concluding  that 

Cr  .  |  ,  is  less  than  Q~  .  whenever  actually  they  are 

'-'pp.  I,...,q,k'  pp.  I , . . . ,q  7  7 

equal  (OOs  usually  chosen  to  be  .05).  This  error  is  usually  called 
the  type  one  error.  Suppose  now,  at  each  step  we  compute  the  statis¬ 
tic  F  of  formula  6.1,  and  compare  with  the  value  of  F^,  ,  lN 

0(0, n-q-l) 

which  can  be  found  in  tables  of  the  F  distribution.  If  the  sample  size 

is  too  small  the  power  of  the  test  will  be  low.  This  means  that  the 

actual  difference  between  Q-  and  Q-  ,  can  be  sub- 

v'pp.l,...,q  pp. I ,..., q,k 

stantial  and  still,  the  probability  that  the  computed  statistic,  F, . .  , 

will  exceed  F rv,  .  .\  can  be  small.  (Of  course,  this  probability 

CX(  I, n-q-l) 
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will  always  be  greater  than  (X  ) •  This  error  is  usually  called  the 
type  two  error. 

On  the  other  hand,  given  that  Q~  ,  _  is  actually  greater 

pp. i ,o..,q 

than  |  k  (the  only  alternative  being  that  they  are  equal) 

regardless  of  how  small  the  actual  difference,  the  probability  that 
statistic  F  exceeds  Fq^  |  n_q_|)  can  be  made  as  close  to  one  as  we 
please  by  increasing  sample  size,  n,  indefinitely. 

Meanwhile,  among  those  variables  for  which  Cpp. | is 
actually  equal  to  Cj^,#  |  . .  ,q,k»  aPProx'  mately  x  100  percent 

are  expected  to  "pass'"  the  F  test  (i.e.,  F  =»  Fq^  ^  ^  inde¬ 

pendently  of  the  sample  size,  n. 

We  have  just  seen  that  the  two  important  factors  that  affect  the 
probability  that  variable  y^  will  pass  a  particular  F  test  are  the 
amount  by  which  the  actual  values  of  i  and  CZ~  i  „  l. 

differ,  and  the  size  of  the  sample,  n0  Thus,  the  decision  rule  we 
mlgliif  use  is  to  terminate  the  step-wise  procedure  at  any  step  that 
all  variables  still  not  in  regression  fail  to  pass  the  F  test. 

With  this  decision  rule,  the  F  test  will  limit  the  variables  in  re¬ 
gression  to  those  whose  contribution  to  reduction  in  conditional 
variance  of  yp  appear  to  be  large  enough  for  the  given  sample  to 
measu  re. 

In  the  next  chapter  we  shall  consider  add! tiona I  halt  criteria 
which  an  experimenter  may  wish  to  impose  on  the  step-wise  process. 
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Chapter  VI  I 


AUTOMATIC  REGRESSION  ANALYSIS  -  CRITERIA 
FOR  HALTING  STEP-WISE  REGRESSION 

In  this  chapter  we  shall  develop  useful  procedures  for  con¬ 
ducting  automatic  regression  analysis  on  a  sample  of  size  n  of  a  p 
variate  normal  using  a  high  speed  electronic  computer,  Efroymson 
[3J  has  developed  an  algorithm  very  suitable  for  computer  use  in 
which  any  single  variable  can  be  added  to,  or  eliminated  from  re¬ 
gression  (depending  upon  its  former  status).  At  any  step  the  re¬ 
gression  coefficients,  conditional  variance  of  yp,  multiple  correla¬ 
tion  coefficient  of  yp  on  the  variables  in  regression,  and  many 
other  desirable  parameters  can  be  computed  easily  and  printed  out® 
Useful  criteria  for  halting  the  regression  process  are  discussed 
and  developed. 

Given  a  sample  of  size  n  of  a  p  variate  normal,  formulas  for 
computing  vector  Z  and  matrix  S  have  already  been  described.  Also, 
basically  we  shall  use  the  step-wise  procedure  of  adding  variables 
to  regression.  The  most  important  remaining  problem  is  to  consider 
how  the  user  of  an  automatic  regression  analysis  computer  program 
can  specify  in  advance  of  the  computer  run,  reasonable  criteria  for 
halting  the  step-wise  procedure. 

So  far,  it  appears  that  a  satisfactory  criteria  for  stopping 
the  regression  process  has  never  been  fully  developed  to  suit  auto¬ 
matic  step-wise  regression.  Miller  £7]  proposes  adding  variables 
until  the  F  test  fails.  He  also  proposes  a  method  of  adjusting  the 
level  for  which  the  critical  F  is  chosen  (l  -  CX  'n  chapter  VI I )  to 
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compensate  for  the  fact  that  the  method  of  choosing  each  variable 

to  enter  regression  is  not  a  random  choices 

In  order  to  derive  a  test  for  the  statistical  significance 
of  Xj,  the  following  analysis  may  be  performed:  When  a 
predictor  is  chosen  at  random  from  a  group  of  predictors, 
an  F  test  is  performed  where  the  critical  F  is  usually 
taken  at  the  95$  level.  This  allows  for  a  one  in  twenty 
chance  for  considering  this  predictor  significant  when  in 
fact  it  is  not.  In  the  screening  procedure  the  selection 
of  Xj  is  not  a  random  choice.  Therefore,  it  is  necessary 
to  determine  at  what  probability  level  the  critical  F 
should  be  taken  while  still  specifying  a  one  in  twenty 
chance  occurrence. 

For  the  screening  procedure  it  appears  proper  to  make  the 
level  for  which  the  critical  F  is  chosen  a  function  of  the 
number  of  poss i b  I e  predictors,  n.  The  ordinary  95$  level 
F  can  be  expressed  as 


F.95  =  F(l  -  l/20)' 

and  for  the  screening  procedure  the  95$  level  is 

_  a|c 

F.95  =  F(l  -  l/20. n)* 

Intuitively,  Miller's  solution  seems  to  be  somewhat  extreme. 
For  example,  if  p  =  51  (arid  (X  “  .05)  then  at  the  first  step  the 
level  chosen  for  the  critical  F,  ot',  would  be  computed  as  follows: 


I  - 


a 


1  - 


20x50 


.998;  <X=  1  -  .998  .00 1  , 


so  that  the  value  used  for  comparison  would  be  F  Q01  ( 1 ,1+9)  *  12.2, 

rather  than  F  ( 1,1+9)  =  1+.03  when  no  adjustment  is  made.  In  this 
»u5 

case  the  critical  F  value  is  arbitrarily  tripled  only  because  there 
are  50  variables  still  not  in  regression.  Granted  that  the  critical 
F  should  be  adjusted  upward  in  order  to  maintain  a  "one  in  twenty 
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chance  occurrence",  it  would  seem  that  due  to  lack  of  information 
as  to  the  extent  of  this  non-random  effect,  one  should  make  such  an 
adjustment  more  conservatively  than  this. 

Perhaps  a  satisfactory  "hedge"  might  be  to  use  the  adjusted 
I eve  I : 


PC 

log  p 


or  (X- 


a 

log  p 


K, 


where  K  is  a  constant  inserted  by  the  program  user  for  his  parti¬ 
cular  sample. 

Conceivably,  one  might  wish  to  make  no  adjustment  at  all  for 
this  effect  because  the  consequences  of  increasing  the  type  two 
error  during  the  early  steps  are  so  detrimental  to  the  step-wise 
procedure, 

Efroymson  proposes  two  F  tests  at  each  step.  His  program 
first  compares  each  variable,  y.,  currently  in  regression  with  an 
appropriate  "min  F"  critical  value  to  see  if  it  still  passes  the 
F  test  of  significance.  If  such  a  variable  is  discovered,  the 
action  at  that  step  is  to  remove  the  variable  from  regression.  By 
setting  min  F  to  a  value  slightly  less  than  the  standard  critical 
value  used  for  adding  variables,  the  possibility  of  creating  an 
endless  loop  is  avoided. 

This  feature  is  appealing  because  new  combinations  in  regression 
obtained  in  this  manner  are  always  more  nearly  optimal  (as  far  as 
the  sample  is  concered)  than  was  the  preceding  combination  of  the 
same  size;  yet  the  number  of  computer  instructions  required  to  do 
this  operation  is  minimal. 
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In  chapter  VI  it  was  shown  that  the  choice  of  ,  .  ■>  as 

CXI  l,n-q-l; 

the  value  of  critical  F  is  made  in  an  attempt  to  limit  the  variables 
in  regression  to  those  whose  contribution  to  reduction  in  conditional 
variance  of  yp  is  large  enough  for  the  given  sample  to  measure.  It 
is  clear  that  Efroymson’s  double  F  test  contributes  to  this  effort 
by  insuring  that  all  variables  in  regression  continue  to  pass  the 
F  test  even  after  subsequent  variables  have  been  added. 

It  is  impossible  to  anticipate  here  all  uses  for  different 
combinations  of  specified  stopping  criteria.  We  already  have  seen 
that  statisticians  so  far  have  only  provided  general  guide  lines 
in  this  area.  This  is  mainly  because  each  individual  p  variate 
normal  distribution  has  its  own  set  of  complications,  and  for  each 
computer  run  on  a  given  sample  the  experimenter  may  have  varying 
amounts  of  prior  information  regarding  the  p  variate  normal  he  is 
studying.  Thus,  for  any  automatic  regression  analysis  computer 
program  it  is  important  that  the  user  of  the  program  be  able  to 
specify  halt  criteria  with  as  much  flexibility  as  possible. 

Perhaps  the  most  important  aspect  of  each  halt  criteria  is 
that  it  must  be  specifiable  in  a  manner  most  meaningful  to  the  ex¬ 
perimenter.  For  example,  some  experimenters  under  certain  conditions 
may  not  look  upon  the  F  test  of  chapter  VI  as  being  useful  to  him  at 
all.  Quite  likely,  he  may  wish  to  replace  F0£(|yn-q-l)  w'th  a  vaSue» 
say  X  ,  to  be  the  critical  amount  of  reduction  of  the  variance  of  yp 
as  a  stopping  rule;  or,  he  may  want  to  specify  both  critical  values. 
Although  A  and  |  n  q_|)  are  'n  different  units,  it  is  clear  that 
the  A  test  is  equivalent  to  an  F  test,  so  that  in  specifying  both 
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tests  the  experimenter  is  merely  having  the  computer  apply  whichever 
test  is  the  most  stringent  at  each  step. 

The  following  example  illustrates  most  of  the  points  covered 
in  the  last  few  paragraphs.  We  show  here  how  a  suitable  choice  of 
min  F  and  critical  F,  artificially  chosen,  can  aid  Efroymson's 
double  F  test  procedure  to  find  a  more  nearly  optimal  combination 
of  variables  in  regression  than  already  obtained  by  the  step-wise 
procedure  at  a  previous  step.  To  do  this  we  take  an  example  worked 
out  by  Hald  |6>J,  section  20.3.  In  this  example,  Hald  used  data 
from  a  sample  of  size  13  of  a  five  variate  distribution  which  we 
will  assume  here  to  be  normal.  The  sample  vector,  Z,  and  sample 
V-C  matrix,  S,  are  the  same  as  those  shown  by  equations  i;0 1  and 
h»2  in  this  paper,  which  in  chapter  IV  were  used  to  define  U  and 
Y  respectively.  In  the  following  illustration  we  shall  consider 
U. I  and  U»2  to  be  computed  Z  and  S  as  in  Ha  Id's  example. 

From  Hald's  example  we  compute  the  F  statistic  J6.  ij  for  each 
variable  in  regression  and  not  in  regression  at  each  step.  See 
table  IV  below.  For  variables  in  regression,  yk,  the  value  com¬ 
puted  is  the  F  statistic  that  would  be  computed  for  yk  if  it,  alone, 
were  removed  from  regression  first.  These  values  pertaining  to 
variables  currently  in  regression  are  underscored  in  table  IV.  In 
order  to  illustrate  the  above  points,  the  F  test  using  Fq£( j  n_q_|) 
was  eliminated. 

We  now  choose  the  artifical  values  of  critical  F  and  min  F  to 
be  3.5  andi3 »0  respectively.  With  this  choice  we  shall  obtain  the 
optimal  combination  of  variables  y(  and  yj,  where  the  regular 
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forward  step-wise  procedure  yielded  variables  yj  and  y^  in  chap¬ 
ter  IV. 

Table  IV 


Step 

Vari  ab  les 
i  n  Reg  r. 
Before 

This  Step 

*1 

Computed  F 

*2 

Stati stic [ 60 

*3 

■') 

1 

0 

12.60 

21.96 

hobo 

(  22.80) 

2 

yu 

( 108.16) 

.17 

bo.  30 

22.80 

3 

y i>  yb 

108. |6 

(  5.03) 

b.2b 

159.21 

b 

CM 

>- 

* 

>S 

15^.02 

5.03 

.01 

00 

o 

5  y  ,  y-  the  optimal  combination  for 

two  va r i ab I es  in  reg ress i on 


The  variables  added  to  or  eliminated  from  regression  were  chosen 
according  to  Efroymson’s  double  F  test  procedure.  Recall  that  no 
variable  was  to  be  added  at  any  given  step  if  the  F  value  of  one  of 
the  variables  a  I  ready  in  regression  got  below  3.0  (min  F).  Hence 
at  step  b*  variable  yj^  was  eliminated  yielding  the  optimal  combina¬ 
tion  yj,  yg.  At  each  previous  step,  the  variable  added  (whose  value 
is  enclosed  in  parentheses)  was  chosen  because  its  F  statistic  was 
the  largest  among  those  still  not  in  regression  and  was  also  greater 
than  the  critical  F  value  which  was  artifically  chosen  to  be  3«5» 

This  example  illustrates  some  complexities  that  arise  during 
the  regression  process  that  are  still  not  completely  explainable 
analytically.  For  instance,  the  relative  values  of  statistic  F 
changed  drastically  as  the  combination  of  variables  in  regression 
changed.  These  values  correspond  to  relative  amounts  of  reduction 
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of  conditional  variance  that  would  be  due  to  the  corresponding 
variable  if  it  were  (or  is)  in  regression.  Thus,  when  was  added 
to  regression  at  the  first  step  the  relative  contribution  in  variance 
reduction  due  to  yj  jumped  from  13  to  108,  implying  that  yj  and  y ^ 
are  much  more  powerful  together  than  their  sum  when  each  is  used 
a  lone. 


This  example  also  suggests  reasons  why  an  experimenter  may  wish 
to  specify  critical  F  values  artificially,  especially  if  results  of 
prior  computer  runs  are  avai  lable. 

It  was  suggested  earlier  that  instead  of  keeping  track  of  com¬ 
puted  values  of  |6.|J  requiring  specification  of  artificial  critical 
F’s  on  the  part  of  the  experimenter,  it  might  be  simpler  for  him  to 
keep  track  of  actual  amounts  of  variance  reduction  of  yp  and  make  up 
artifical  values  of  A  >n  units  of  variance  reduction  of  yp.  Also 
it  is  clear  that  the  experimenter  may  wish  to  specify  a  value,  say, 
-InA*  A  ,  which  would  become  the  critical  amount  of  variance  re¬ 
duction  requi red  of  each  variable  in  regression,  in  order  to  stay  in 
regression. 

The  following  summary  lists  a  few  useful  halt  criteria  which  the 
experimenter  may  wish  to  specify  before  the  automatic  regression 
analysis  is  performed  on  a  given  sample.  The  automatic  regression 
program  should  permit  the  experimenter  to  specify  any  combination 
of  these  criteria  for  any  given  computer  run: 

1.  Frv/ ,  and  min  F  (Chapter  VI ) 

CX(  I  ,n-q-l ) 

2.  A  and  minA  (defined  above) 

3*  Stop  when  the  conditional  variance  of  yp  gets  as  low  as 
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V  percent  of  the  original  variance,  spp» 

b*  Stop  when  the  conditional  variance  of  gets  as  low  as  T, 

5*  Stop  when  W  variables  have  been  added  to  regression. 

In  chapter  IX  we  shall  propose  a  procedure  using  some  of  the 
above  halt  criteria  in  searching  for  an  optimal  combination  of  vari¬ 
ables  in  regression. 

In  chapter  V  it  was  stated  that  one  good  reason  for  reducing 
the  number  of  variables  in  regression  might  be  to  reduce  the  cost 
of  observing  the  variables  from  which  each  future  prediction  of  y_ 

r 

is  to  be  computed.  Often  some  of  the  variables  cost  considerably 
more  to  observe  than  others,  and  the  experimenter  may  not  be  so 
interested  in  reducing  the  total  number  of  variables  to  observe, 
as  he  is  in  reducing  the  tota I  cost  to  observe  the  values  of  the 
variables  in  regression  for  each  prediction  of  yp  to  be  made  later. 

Thus,  it  is  desirable  that  the  experimenter  be  able  to  specify  ob¬ 
servation  costs,  cj,  (say,  in  dollars)  for  each  "independent"  vari¬ 
able  y,  ,...,yp_|,  anc*  have  the  automatic  regression  analysis  operation 
reflect  these  costs  when  selecting  variables  to  go  into  regression. 

The  "cost  option"  should  differ  from  the  regular  option  only  in 
the  criteria  used  at  each  step  to  determine  which  variable  is  to  be 
added  to  regression.  Recall  that  the  regular  option  calls  for  chosing 
the  variable  that  will  reduce  the  variance  of  yp  the  most,  to  be  the 
variable  added. 

In  the  cost  option,  at  each  step,  those  variables  still  not  in 

regression  are  determined.  Then,  instead  of  y.  for  which  Qr  .  . 

x  pp. i , . , . ,q,K 

is  least,  (as  estimated  by  s__  ,  _  .,),  y»  is  chosen  for  which 

pp. i , . . . ,q,K  j 
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cj  /  ^pp.  I , . ,q, is  least|  '°e°'  Vj  is  chosen  on 

the  basis  that  it  is  cheapest  in  terms  of  "dollars"  to  observe  per 

unit  of  variance  reduction  of  y  ,  due  to  adding  y  It  is  clear 

P  J 

that  the  standard  option  is  just  a- special  case  of  the  cost  option 
in  which  the  observation  costs  are  all  specified  to  be  equal. 

Since  optimality  is  now  measured  in  terms  of  minimum  cost  to 
observe  per  unit  of  variance  reduction  instead  of  maximum  variance 
reduction,  the  program  user  must  be  able  to  specify  a  halt  criterion 
so  that  whenever  the  cost  to  observe  a  variable  in  regression  be- 
comes  greater  than,  say,  max  C,  the  program  will  remove  it,  and 
whenever  all  variables  still  not  in  regression  would  cost  more  than, 
say,  C  dollars  per  unit  of  variance  reduction,  if  added,  the  program 
should  halt  regression.  Now,  min  A  and  A  are  not  needed  as  halt¬ 
ing  criteria  for  the  cost  option.  However,  the  experimenter  should 
still  have  the  option  of  including  other  halt  criteria  summarized 
above. 


To  summarize,  neither  Miller’s  nor  Efroymson's  stopping  rules 
are  optimal.  Both  basically  use  only  the  statistical  F  test  of 
chapter  VI  as  a  decision  rule  for  halting.  It  has  been  illustrated 
here  that  additional  decision  criteria  that  can  be  specified  by  the 
experimenter  in  terms  more  meaningful  to  him,  may  greatly  facilitate 
his  search  for  optimal  combinations  of  variables  in  regression. 
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Chapter  VI  I  I 

THE  MV  REGRESSION  AND  MV  SIM  COMPUTER  PROGRAMS 


The  purpose  of  this  chapter  is  to  describe  a  computer  program, 
called  MV  REGRESSION,  which  performs  automatic  regression  analysis 
on  a  sample  of  size  n,  Also  briefly  outlined  is  program  MV  SIM 
which  generates  samples  of  size  n  from  a  specified  p  variate  normal* 
The  detailed  operation  of  MV  SIM  is  described  in  appendix  A*  Both 
programs  are  written  in  NELIAC  compiler  language*  Operation  of 
these  programs  on  the  Control  Data  Corporation  model  l60l+  computer 
at  the  U,  S,  Naval  Postgraduate  School  has  produced  all  of  the  com¬ 
putations  involved  in  the  examples  throughout  this  paper  as  well  as 
the  test  results  discussed  in  chapter  IX  and  appendix  B, 

Briefly,  MV  SIM  will  analyze  a  specified  p  variate  normal  (given 
by  U  and  X!)  and  print  out  true  regression  coefficients  and  associated 
for  any  set(s)  of  q  variables  specified  by  the  program 

^pp.  I,...,q 

user  (q  —  p-l)*  Next,  MV  SIM  wi I  I  generate  a  sample  of  size  n  from 
the  specified  p  variate  normal  and  compute  sample  vector  Z  and  sample 
V-C  matrix,  S.  Before  turning  control  to  program  MV  REGRESSION,  MV 
SIM  performs  statistical  tests  on  Z  and  S,  and  prints  out  results  of 
these  tests,  but  takes  no  action  based  on  these  results*  These  sta¬ 
tistical  tests  and  actual  computer  run  results  are  discussed  in  de¬ 
tail  in  appendix  B. 

Before  proceeding  with  a  description  of  MV  REGRESSION,  it  is 
interesting  to  consider  the  powerful  research  tool  one  has  when  he 
can  specify  a  p  variate  normal  (U  and  )  and  quickly  generate 
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random  samples  from  that  distribution#  It  is  obvious  that  this 
operation  saves  much  time  in  gathering  data,  or  in  "making  up" 
reasonable  samples  when  it  is  desired  to  test  the  operation  of  a 
regression  analysis  program  such  as  MV  REGRESSION.  (This  was  the 
case  when  computations  for  example  in  this  paper  were  required)# 
But  MV  SIM  offers  the  statistician  a  much  more  useful  research 
capability  than  this.  Using  MV  SIM  one  can  make  accurate  compari¬ 
sons  of  the  results  of  any  regression  scheme  with  true  regression 
equations,  conditional  variances,  etc#,  which  MV  SIM  computes  from 
the  specified  U  and  £  #  Of  course,  for  such  a  comparison,  the  re¬ 
gression  scheme  must  be  applied  to  a  sample  drawn  by  MV  SIM  from 
the  distribution  specified  by  U  and 

The  sampling  capability  of  MV  SIM  also  makes  it  possible  to 
perform  empirical  sampling  studies  of  random  variables  whose  dis¬ 
tributions  are  difficult  to  find  theoretically.  One  such  study, 
now  in  progress,  is  discussed  in  chapter  IX# 

We  finish  this  chapter  with  a  detailed  description  of  program 
MV  REGRESSION. 

The  inputs  of  MV  REGRESSION  are  as  follows; 

I.  Start  with  a  sample  of  n  observations  of  the  p  variate 
normal.  If  MV  SIM  supplies  the  sample,  it  will  supply 
it  in  the  form  of  Z  and  S# 

2#  Specify  "standard"  or  "cost"  option  (see  chapter  VI l)# 

If  cost  option,  give  cost  of  observation,  c.,  for 
variables  y.,  for  i  =  l,###,p-l#  If  the  user  specifies 
"standard",  he  still  may  specify  costs  and  obtain 
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printed  cost  data  even  though  the  "regular”  criteria  is 
used  as  far  as  entering  variables  into  regression  is 
concerned. 

Specify  criteria  for  halting  regression  of  a  samples 

A.  l)  Fq^|  the  value  to  be  compared  with  statistic 

F  for  adding  variables  to  regression. 

2)  Min  F,  a  value  less  than  Fq^  ,  ^  to  be  compared 
with  statistic  F  for  removing  variables  from 
regression. 

B.  l)  Last  variable  added  reduced  the  conditional 

variance  of  yp  by  less  than  ^  (not  used  for 
cost  option). 

2)  Last  variable  added,  y^,  costs  more  than  C 

dollars  to  observe  per  unit  of  variance  reduction 
of  yp  due  to  adding  y^  (used  only  for  cost  option). 

C.  Conditional  variance  of  yp  became  less  than  T. 

D.  Number  of  variables  in  regression  reached  W, 

Before  step  I  of  the  regression  operation,  MV  REGRESSION  prints 
out  s  ,  and  (optionally)  the  RR  matrix.  (The  RR  matrix  is  a  pxp 

r  r 

matrix  which  contains  all  current  data  in  compact  form  from  which 
all  required  parameters  at  each  step  can  be  computed.  Initially, 
it  is  a  matrix  of  sample  correlation  coefficients  which  is  easily 
computed  from  sample  V-C  matrix  S.  See  Efroymson  ^Bj). 

At  each  step,  after  a  variable  has  been  added  to  regression, 
the  following  data  is  printed: 

I  a.  "Best"  variable  to  have  been  added  (variable  with 


b.  "Cheapest"  variable  to  have  been  added,, 

c.  Whichever  of  the  two  variables  above  that  actually 
was  added  (a.  if  regular  option,  b.  if  cost  option) 

The  value  used  in  the  F  test  for  the  added  (or  removed) 
variable.  MV  REGRESSION  compared  this  value  with  the 
input  value  of  ^(^(i^n)  (or  n  • 

a.  The  square  of  the  estimated  new  multiple  correlation 
coefficient  of  yp  on  the  variables  in  regression. 

b.  The  estimate  of  the  new  conditional  variance  of 

SPP»  !>•••  »ci* 

The  cost  to  observe  the  variable  just  added,  yk,  per  unit 
of  conditional  variance  reduction  due  to  the  addition  of 
this  variable  to  regression  at  this  time.  This  is  com- 

puted  as  ck  /  (spp#|>(><(0#q  ”  spp.l,oo.,q,k^° 

a.  A  list  of  the  new  set  of  variables,  yj,  (i  =  l,...,q) 
in  regression. 

b.  The  estimated  regression  coefficients,  b». 

The  cost  to  observe  the  new  set  of  variables  in  regression 
per  unit  of  total  variance  reduction  of  yp.  This  is  com¬ 
puted  ass 


I 

T  =  I 


S  .  ) 

pp© I ,«oo>q 


The  new  RR  matrix  (optional) 


As  indicated  earlier,  it  is  possible  to  specify  cost  of  obser¬ 
vations,  Cj,  even  though  the  standard  option  is  used,  In  this  case, 
items  IV  and  VI  are  still  computed  and  printed,  but  of  course,  the 
"best"  variable  to  add  (item  la)  is  still  the  one  actually  added,. 

Each  step,  at  which  a  variable  is  being  removed  from  regression, 
item  I  above  becomes  "the  variable  just  removed",  and  items  II,  III, 
V,  VI,  and  VII  only  are  printed. 

Minor  changes  to  the  program  can  be  made  to  cause  it  to  print 
out  other  data  after  each  step,  such  as  estimated  variances  of  the 
estimated  regression  coefficients. 

The  next  few  pages  show  the  actual  program  output  of  a  regres¬ 
sion  analysis  performed  by  MV  REGRESSION  on  a  sample  of  size  J>00  of 
a  five  variate  normal.  This  sample  was  generated  by  the  MV  SIM  pro¬ 
gram  using  input  vector  U  and  V-C  matrix  X!  given  by  Ij.,  I  and  2j..2, 
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MULTIVARIATE  ANALYSIS  (CONTINUED)  2  19  1963  PAGE  7 

COMPUTER  RUN  DATA 
NUMBER  OF  SAMPLES  =  3 


CRITERIA  FOR  CHOOSING  WHICH  VARIABLE  TO  ADD  TO 
REGRESSION  (AMONG  THOSE  PASSING  F  TEST) 

MAXIMUM  REDUCTION  OF  THE  CONDITIONAL  VARIANCE  OF  Y  5 

HOWEVER, 

THE  FOLLOWING  COSTS  OF  OBSERVATION  ARE  SPECIFIED 

Y 1  Y2  Y3  Y4  Y5 

10.0000  12.C000  16.0000  20.0000  .0000 


ANY  ONE  OF  THE  FOLLOWING  CONDITIONS  CAN  HALT  REGRESSION  STEPS 

1)  NUMBER  OF  VARIABLES  IN  REGRESSION  REACHED  4 

2)  CONDITIONAL  VARIANCE  OF  Y  5  BECAME  LESS  THAN  4.0 

3)  LAST  VARIABLE  ADDED  REDUCED  THE  CONDITIONAL  VARIANCE 

OF  Y  5  BY  LESS  THAN  2.0 

4)  LAST  VARIABLE  ADDED  COSTS  MORE  THAN  10.00  DOLLARS 

TO  OBSERVE  PER  UNIT  OF  VARIANCE  REDUCTION  OF  Y  5 

5)  NO  MORE  VARIABLES  (AMONG  THOSE  NOT  IN  REGRESSION) 

PASS  THE  F  TEST  OF  SIGNIFICANCE 
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MULTIVARIATE  ANALYSIS  (CONTINUED)  2  19  1963 


PAGE  8 


SAMPLE  NUMBER 
SAMPLE  OF  SIZE 

SAMPLE  MEANS 


1 


3C0  OF  THE  5  VARIATE  NORMAL 


Y 1 

7. 4125 


Y2 

48.2586 


Y3 

11.9077 


SAMPLE  VARIANCE  COVARIANCE  MATRIX 


Y  1 

31.5587 


Y2 

14.6327 


Y3 

28.5108  - 


Y4 

29.8140 


Y4 

17.8985 


14.6327  227.5360 
28.5108  -  3.9154 
17.8985  -  243.4449 
55.3288  171.7598 


3.9154  -  243.4449 
39.1032  -  7.6404 

7.6404  276.5990 

39.8334  -  191.4721 


Y5 

95.4458 


Y5 

55.3288 

171.7598 

39.8334 

191.4721 

199.0421 


MULTIVARIATE  ANALYSIS  (CONTINUED)  2  19  1963  PAGE  10 

ANALYSIS  OF  SAMPLE  NUMBER  1 

SAMPLE  VARIANCE  OF  Y  5  =  199.0421 

F  LEVEL  TO  ENTER  =  3.87  F  LEVEL  TO  REMOVE  =  3.7 


MATRIX  TO 

START 

Y  1 

1.0000 

Y2 

.1726  - 

Y3 

.8116  - 

Y4 

.1915 

Y5 

.6981 

.  1726 

l.COOO  - 

.0415  - 

.9703 

.8070 

.8116  - 

.0415 

1.0000  - 

.0734  - 

.4515 

.1915  - 

.9703  - 

.0734 

l.OCCO  - 

.8160 

.6981 

.8070  - 

.4515  - 

.8160 

1 .0000 
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MULTIVARIATE  ANALYSIS  (CONTINUED)  2  19  1963  PAGE  11 

STEP  1 

BEST  VARI.ABLE  TO  ADD  WAS  Y  4 
CHEAPEST  VARIABLE  TO  ADD  WAS  Y  2 
VARIABLE  ADDED  WAS  Y  4 

STATISTIC  USED  TO  COMPARE  WITH  F (  1  *  298)  =  595.9689 

NEW  MULTIPLE  CORR  COEFF  SQUARED  =  .3340 

NEW  CONDITIONAL  VARIANCE  =  66.7210 

COST  TO  OBSERVE  Y  4  IN  DOLLARS  PER  UNIT  VARIANCE  REDUCTION  =  .1511 

NEW  SET  OF  VARIABLES  IN  REGRESSION 
4 


COEFFICIENTS 

E(I  ) 

BO  =  1 16.0842 

.6922 

.cooo 

.0000 

.0000 

.0000 

COST  TO  OBSERVE  THIS  SET 
OF  VARIANCE  RECUCTION 

OF  VARIABLES 
OF  Y  5 

PER  UNIT 

20.0000 

132.3210 

DOLLARS  DIVIDED  BY 

UNITS  OF  VARIANCE  REDUCTION  = 

.  1511 

THE  NEW  RR  MATRIX 

Y 1 

.9632  - 

Y2 

. C 1 32  - 

Y3 

.8256 

Y4 

.1915 

Y5 

.5417 

.0132 

.0583  - 

.1128 

.9703 

.0152 

.8256  - 

.1128 

.9946 

.0734  - 

.5114 

.1915  - 

.9703  - 

.0734 

1.0000  - 

.8160 

.5417 

.0152  - 

.5  114 

.8160 

.3340 
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MULTIVARIATE  ANALYSIS  (CONTINUED)  2  19  1963 

STEP  2 

BEST  VARIABLE  TO  ADO  WAS  Y  1 
CHEAPEST  VARIABLE  TO  ADD  WAS  Y  1 
VARIABLE  ADDED  WAS  Y  1 

STATISTIC  USED  TO  COMPARE  WITH  F(1,  297)  =  3089.6640 

NEW  MULTIPLE  CORR  COEFF  SQUARED  =  .0293 

NEW  CONDITIONAL  VARIANCE  =  5.8889 

COST  TO  OBSERVE  Y  1  IN  DOLLARS  PER  UNIT  VARIANCE  REOUCTICN  = 


NEW  SET  OF  VARIABLES  IN 

REGRESSION 

1 

4 

COEFFICIENTS 

e<  I  ) 

BO  =  102. 

8895 

1.4124  - 

.6008 

.0000 

.0000 

.0000 

COST  TO  OBSERVE  TFIS  SET 
OF  VARIANCE  REDUCTION 

OF  VARIABLES  PER  UNIT 
OF  Y  5 

30.0000 
193. 1531 

DOLLARS  DIVIDED  BY 

UNITS  CF  VARIANCE  REDUCTION  = 

.  1553 

THE  NEW  RR  MATRIX 

Y  1 

1.0380  - 

Y2 

.  Cl  37  - 

Y3 

.8571 

Y4 

.1988 

Y5 

.5624 

.0137 

.0581  - 

.1241 

.9730 

.0226 

.8571  - 

.  1241 

.2868 

.2376  - 

.0470 

.1988  - 

.9730  - 

.2376 

1.0380  - 

.7082 

.5624 

. C226  - 

.0470 

.7082 

.0293 

.1643 
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MULTIVARIATE  ANALYSIS  (CONTINUED)  2  19  1963 

STEP  3 

BEST  VARIABLE  TO  ADO  WAS  Y  2 
CHEAPEST  VARIABLE  TO  ADD  WAS  Y  2 
VARIABLE  ADDED  WAS  Y  2 

STATISTIC  USED  TO  COMPARE  WITH  F(  1,  296)  =  127.14672 

NEW  MULTIPLE  CORR  COEFF  SQUARED  =  .0205 

NEW  CONDITIONAL  VARIANCE  =  4.1344 

COST  TO  OBSERVE  Y  2  IN  DOLLARS  PER  UNIT  VARIANCE  REDUCTION  = 
NEW  SET  OF  VARIABLES  IN  REGRESSION 


1 

2 

4 

COEFFICIENTS 

B  (  I  ) 

BO  =  75.6177 

1.4258 

.3643  - 

.2792  .0000 

.0000 

COST  TO  OBSERVE  THIS  SET  OF  VARIABLES  PER  UNIT 
OF  VARIANCE  REDUCTION  OF  Y  5 

42.0000  DOLLARS  DIVIDED  BY 
194.9076  UNITS  OF  VARIANCE  REDUCTION  =  .2154 


THE  NEW  RR 

MATRIX 

Y 1 

Y2 

Y3 

Y4 

Y5 

1.0413 

.2360  - 

.8864 

.4285 

.5677 

.2360 

17.1986  - 

2.1 349 

16.7347 

.3895 

.8864 

2.1349 

.0218 

2.3150 

.0012 

.4285 

16.7347  - 

2.3150 

17.3214  - 

.3292 

.5677 

.3895 

.0012 

.3292 

.0205 

3)  LAST  VARIABLE  ADDED  REDUCED  THE  CONDITIONAL  VARIANCE 
OF  Y  5  BY  LESS  THAN  2.0 


6.8394 
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Chapter  IX 

CURRENT  STUDIES  AND  PROPOSALS  FOR  FUTURE  RESEARCH 


In  this  chapter  we  discuss  tests  that  have  been  started  using 
programs  MV  SIM  and  MV  REGRESSION.  Also,  plans  for  future  research 
are  proposed. 

Some  tests  (described  in  Appendix  B)  of  a  large  number  of 
samples  generated  by  MV  SIM  have  been  completed. 

An  empirical  sampling  study  to  study  the  random  variables  in¬ 
volved  in  the  F  test  of  chapter  VI  has  been  started  since  the  form 
of  the  distribution  is  unknown  and  extremely  difficult  to  obtain 
in  closed  form.  Actually,  p-l  random  variables,  which  we  will  call 
Gj,...,Gp  j,  are  under  study  at  the  same  time0  They  are  defined  by 
a  specified  p  variate  normal,  the  size  of  each  sample  of  the  p 
variate  normal,  n,  and  the  method  of  computing  values  of  G., 
i  =  I , . • • , p— I ,  from  a  sample  which  is  described  next. 

At  step  one,  G|  is  defined  as  the  maximum  value  of  ‘ 1 6 0 ij  where 
F  is  computed  for  each  of  the  p-l  variables  (none  of  which  are  in 
regression  yet).  Gg  is  dependent  upon  Gj  in  the  sense  that  G2  is 
the  value  of  max  F 1 6.  l)  computed  after  the  variable  for  which  F 
equals  G|  has  been  entered  into  regression.  Thus,  at  step  two,  max  F 
is  the  maximum  value  of  F  for  those  p-2  variables  still  not  in  regres= 
sion.  The  step-wise  procedure  continues  without  the  use  of  any  tests 
for  halting  so  that  a  new  variable  is  added  at  each  step.  Thus,  at 
step  i,  Gj  equals  max  F,  where  F  is  computed  for  each  variable  still 
not  in  regression  by  step  i.  After  G.  is  recorded,  the  variable  for 
which  F  *  Gj  is  entered  into  regression. 
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Since  the  values  of  |6» Ij depend  upon  the  sample  size,  we  see 

that  each  sample  of  size  n  of  a  specified  p  variate  normal  produces 

one  value  of  each  of  the  random  variables  G  j , , , ,  ,G^_ j  0  Also,  to 

obtain  repeated  sets  of  values  of  the  same  random  variables,  the 

sample  size  must  be  kept  constant. 

The  tests  that  have  been  completed  were  performed  on  the  five 

variate  normal  specified  by  formulas  lu  I  and  iu  2,  Six  sample  sizes* 

50,  100,  150,  200,  250,  and  $00  have  been  computed  50  times  each. 

The  results  of  Gj,  G^*  G^,  G^  for  the  sample  size  100  are  plotted 

below  in  the  form  of  estimated  cumulative  distribution  functions 

(c,d,f.'s).  Where  feasible,  the  graphs  also  show  the  curve  of  the 

c.d,f.  of  F /,  ,\»  (Recall  that  if  the  F  test  of  chapter  VI  had 

\ l,n-q-U 

been  applied,  each  value  of  would  have  been  compared  with 

F(X(l»n-q-l)  at  Step  qr*'1,  f°r  ^  =  °*  '»  2»  3)* 
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So  far  the  min  A  parameter  test  has  not  been  Implemented  In 
program  MV  REGRESSION  so  the  type  of  art!  flea  1  control  of  the  step=> 
wise  process  described  in  chapter  VII  has  not  been  tested,,  However, 
a  number  of  samples  of  size  300  of  an  18  x  18  matrix  (the  same  matrix 
used  in  Appendix  B)  have  been  processed  by  MV  REGRESSION,  using 
rather  wide  limits  on  the  halt  criteria#  After  examination  of  the 
first  run  it  was  obvious  that  three  variables  in  regression  were 
—  too  many  and  that  either  one  or  two  would  be  the  right  number# 

Since  the  sample  size  was  large,  most  samples  allowed  nine  or  more 
variables  to  enter  regression  on  the  basis  of  passing  the  F  test 
even  though  nearly  all  of  the  variables  beyond  two  reduced  the 
estimated  conditional  variance  of  y^g  by  less  than  I #0  unit#  By 
comparison,  the  first  variable  usually  reduced  s(g  from  about  I8#6 
to  about  6.5#  An  examination  of  the  computed  statistics  of  all 
variables  (whether  in  regression  or  not)  made  it  apparent  that  some 
test  such  as  the  minA  test  might  be  quite  useful  here# 

Advantage  was  taken  of  the  fact  that  the  true  p  variate  normal 
was  known  when  samples  obtained  from  it  were  being  analyzed  by 
MV  REGRESSION.  For  example,  after  the  first  run  on  several  samples 
of  the  18  variate  normal,  only  six  of  the  17  possible  predictors 
ever  got  into  regression  by  the  third  step#  Hence,  all  possible 
pairs  of  these  six  variables  were  fed  back  to  MV  SIM  for  which  the 
true  conditional  variances  of  y^g  were  computed# 

The  various  halt  criteria  suggested  in  chapter  VII  can  be  useful 
in  developing  methods  of  searching  for  optimal  combinations  of  vari~ 
ables  in  regression#  It  is  proposed  that  procedures,  such  as  the 
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one  described  below,  be  tested  and  compared  with  procedures  already 
described  to  see  if  better  results  can  be  obtained* 

We  wi 1 1  assume  that  an  experimenter  has  a  large  sample  that 
perhaps  was  very  expensive  to  obtain.  We  shall  permit  the  experl- 
menter  two  computer  runs  on  the  sample  samples  the  first  run  pro¬ 
viding  a  set  of  feed-back  data  for  the  second  run. 

The  main  purpose  of  the  first  computer  run  is  to  determine  a 
lower  bound  on  the  conditional  variance  of  yp.  This  is  accomplished 
by  using  the  F  (and  min  F)  test  with  the  step-wise  procedure  with  0( 
set  to  permit  most  variables  to  enter  regression.  Of  course,  at  each 
step  valuable  information  such  as  the  conditional  variance  of  yp,  and 
the  amounts  of  variance  reduction  due  to  each  variable  should  be 
pri  nted. 

From  the  first  run  the  experimenter  chooses  the  maximum  number 
of  variables,  say  m,  that  he  will  have  in  his  final  prediction  equa¬ 
tion.  This  is  usually  easy  to  do  by  examining  the  decreasing  values 

of  s^  s„  .  0  ,  sn„  ,  ;  where  q,  q  —  m,  represents  the 

pp.l  pp.1,2,...  pp.l,...,q  -i"  t 

number  of  variables  in  regression  after  the  first  computer  run. 

The  purpose  of  the  second  computer  run  is  to  make  a  rather 
thorough  (but  not  exhaustive)  search  for  the  optimal  combination  of 
m  variables  in  regression.  The  procedure  is  to  conduct  p~l  separate 
regressions,  each  regression  starting  with  a  different  first  variable, 
and  continuing  until  m  variables  are  in  regression.  At  each  step 
(after  the  first),  the  variable  chosen  to  enter  regression  will  be 
the  variable  that  can  contribute  most  reduction  in  the  conditional 
variance  of  yp,  unless,  by  adding  this  variable,  a  combination  that 
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had  been  in  regression  previously  (during  a  previous  regression) 
would  result.  For  example,  if  the  first  regression  added  variables 
in  order  yj,  y^,  yg,  then  if  the  second  regression  proceeded  as 
yg*  ^5*  variable  yj  would  not  be  permitted  to  enter  regression  next. 
Instead,  the  second  best  variable  would  be  chosen  at  this  step0 

Thus,  after  the  second  computer  run  is  completed  the  experi¬ 
menter  will  have  (p-l)xm  prediction  equations  (and  conditional 
variances  of  y  )  to  choose  from,  p-l  for  each  number  of  variables 
in  regression. 

Two  further  investigations  are  proposed.  In  Appendix  B,  the 
results  of  tests  of  a  number  of  samples  of  a  five  and  an  18  variate 
normal  are  described.  As  a  result  of  the  failure  of  the  sample  V-C 
matrices,  S,  of  the  18  variate  normal  to  pass  the  chi-square  test, 
it  is  proposed  that  further  testing  of  the  multivariate  normal 
generator  be  conducted.  As  indicated  in  Appendix  B  the  possibility 
of  round  off  error  should  be  considered. 

It  is  also  suggested  that  a  study  be  made  to  ascertain  which 
of  the  two  suggested  tests  of  the  matrix  S  is  better.  Possibly  a 
study  would  indicate  weakness  in  both,  Anderson  [ij,  section  10,8, 
describes  a  third  test  of  matrix  S, 

The  step-wise  procedure  of  regression  analysis  as  described  in 
this  paper  is  called  the  "forward*  method  because  it  starts  with  no 
variables  in  regression  and  adds  them  to  regression  one  at  a  time. 
This  is  because  the  forward  procedure  permits  computational  short¬ 
cuts  so  that  the  number  of  computations  can  be  minimized  (especially 
so  when  Efroymson’s  computer  program  algorithm  is  used  |3|)o  The 
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backward  operation  of  removing  extraneous  variables,  however,  offers 
no  computational  advantages.  See  Quenouo lie  j^J .  Another  reason 
the  forward  procedure  can  be  done  with  fewer  computations  is  because, 
usually  the  number  of  variables  in  the  final  regression  os  much  less 
than  p-l.  Often  the  reason  for  a  large  number  of  independent  vari= 
ables  to  be  examined  compared  to  the  number  finally  used,  os  that 
from  those  variables  actually  measured  additional  variables  are  often 
created  to  account  for  possible  curvi I Ineari ty  and  interaction.  For 
example,  i f  Xj  is  a  variable  whose  value  was  actually  measured, 
variables  Y  ■  X|^,  Z  =  X ^  may  be  computed  and  used  as  part  of  the 
original  p-l  possible  predictors.  j^J ,  see  page  20. 

One  possible  advantage  in  using  the  backward  method  is  to  start 

the  process  by  computing  an  estimate  of  the  lowest  possible  value  of 

the  conditional  variance  of  y  .  s ,  „  , .  If  somehow  this  value 

'P  pp.l».o.,p-l 

could  be  obtained  before  the  forward  procedure  was  performed,  one 
could  estimate  the  amount  of  reduction  available  in  the  combined  com¬ 
bination  of  variables  still  not  in  regression  at  each  step.  Knowledge 
of  this  value  at  each  step  should  be  useful  in  deciding  which  way  would 
be  best  to  go  nexts  i.e.,  eliminate  the  weakest  variables  now  in  re¬ 
gression,  or  add  the  strongest  variable  still  not  in  regression,  or 
to  halt. 
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Appendix  A 


GENERATION  OF  THE  P  VARIATE  NORMAL 
BY  PROGRAM  MV  SIM 


For  the  construction  of  each  sample  (of  size  one)  from  the 
specified  p  variate  normal,  MV  SIM  uses  an  independent  sample  of 
size  p  from  the  normal  (0, I )  distribution,  (e.g.,  mean  u  =  0, 
variance  Q~  ~  0  * 

To  obtain  each  independent  normal  random  sample  (of  size  one), 

MV  SIM  computes  a  function  of  an  independent  sample  of  size  12  from 
the  uniform  (0,l)  distribution,  (e.g.,  uniform  on  the  interval  zero 
to  one).  That  this  function  only  approximates  normally  distributed 
random  numbers  will  be  shown  below. 

It  follows  from  the  above  that  to  generate  a  sample  of  size  n 
of  a  p  variate  normal,  nxpxl2  random  numbers  from  the  uniform  (0,l) 
random  number  generator  are  required. 

A  discussion  of  several  techniques  for  generating  uniformly 
distributed  "pseudo"  random  numbers  is  given  by  Barron  . 

Empirical  test  procedures  are  also  given. 

The  particular  uniform  (0, 1 )  pseudo  random  number  generator 
used  by  MV  SIM  is  a  subroutine  called  RAND.  RAND  was  programmed 
according  to  specifications  given  by  Green,  Bert  F0  Jr.,  Smith,  J.  E., 
and  Klem,  Laura  The  number  of  initial  random  numbers,  n  in  the 

reference,  used  by  RAND  is  seven.  This  article  also  discusses  a 
number  of  empirical  tests  that  have  been  applied  to  this  methodo 

The  method  by  which  MV  SIM  uses  12  independent  uniform  (0,l) 
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random  numbers  to  compute  each  pseudo  normal  (0,1)  random  number 


is  discussed  by  Vaa  |^1  OJ  ,  see  page  1*0,  Briefly,  each  normal  random 
number  is  computed  as: 


12 


where  the  Wj  are  the  required  independent  sample  of  size  12  from 


the  uniform  (0,l)  distribution.  The  variance  of  the  uniform  (0,l) 
distribution  is  one-twelfth  and  variances  of  independent,  uniformly 
distributed  random  variables  are  additive  under  convolution.  Hence 
it  is  convenient  to  select  12  as  the  number  of  uniform  random  vari¬ 
ables  whose  sum  will  approximate  a  normal  variable.  Means  of  (inde¬ 
pendent)  uniform  variables  are  also  additive  so  that  it  remains  to 
subtract  the  constant  six  from  the  sums  of  12  independent  uniform 
(0,l)  random  variables  to  approximate  the  normal  (0,l)  distribution, 
Vaa  has  a  discussion  of  the  advantages  and  disadvantages  of  this 
"truncated”  approximation  to  the  normal  distribution. 

Wold  [l  lj  »  pages  xi  to  xi  i  i ,  describes  the  method  which  MV  SIM 
uses  to  convert  an  independent  sample  of  size  p  from  the  normal 
(0,l)  distribution  to  a  sample  from  a  p  variate  normal  specified  by 
U  and  X!  ,  This  method  requires  the  computation  of  a  pxp  triangular 
P  matrix,  P  from  the  original  V-C  matrix,  Z!  ,  so  that  the 

following  matrix  equation  holds: 


For  our  discussion  we  arbitrarily  choose  the  triangulation  of 
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P  =  ^Pjj}  so  that  pjj  =  0  when  j  =»  i,  i.e.,  let  all  "upper  diagonal" 
elements  of  P  equal  zero.  Next,  assuming  X|,..«,x  is  an  independent 
sample  of  size  p  from  normal  (0,l),  the  sample  of  size  one  of  the  p 
variate  normal  is  computed  as? 

y i  “  u i  +  p i i  xi 

y2  -  u2  +  p2|  x,  +  p22  x2 

• 

o 

9 

yP  -  V  +  pp'  xi  +  •••  +  ppp  V 

where  the  U(  are  the  elements  of  mean  vector  U. 

The  term  "pseudo"  random  number  is  customarily  given  to  numbers 
generated  by  arithmetic  means,  see  Barron  j^2j ,  pages  5»  6,  of  which 
the  RAND  subroutine  is  one. 

It  is  now  clear  that  the  samples  of  size  n  of  the  p  variate 
normal  generated  by  MV  SIM,  are  themselves  pseudo  random  numbers, 
since  they  are  merely  arithmetic  functions  of  uniform  pseudo  random 
numbers,,  Perhaps  in  this  context,  the  operation  of  this  part  of 
MV  SIM  might  have  been  called  "simulation"  of  a  p  variate  normal, 
rather  than  "generation".  To  carry  this  process  one  step  further, 
sample  mean  vector  Z,  and  V-C  matrix  S,  being  arithmetic  functions 
of  a  sample  of  size  n,  are  likewise  pseudo  random  matrices.  As  in 
the  case  of  the  pseudo  uniform  and  normal  random  numbers,  it  is 
desirable  that  some  empirical  tests  be  applied  to  these  pairs  of 
pseudo  random  matrices. 

Appendix  B  describes  some  tests  in  details  one  for  vector  Z, 
and  one  for  matrix  S.  These  tests  are  (optionally)  performed  by 
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MV  SIM  on  each  sample,  but  MV  SIM  takes  no  corrective  action  except 
to  print  out  the  value  of  the  computed  statistics  and  an  indication 
of  the  proper  distribution  to  be  compared  with  the  statistics,, 

The  Sequential  Operation  of  Program  MV  SIM  is  as  followss 
l«  Print  out  input  mean  vector  U,  and  V-C  matrix  XI,  and 
other  miscel laneous  data  identifying  the  computer  run0 
2«  Compute  the  P  matrix  from  X!  as  described  above,  Op¬ 
tionally,  the  P  matrix  may  be  printed  out0 

3.  List  the  variance  of  y  .  • 

'P  pp 

4a  Compute  the  prediction  equation  for  yp,  for  each  combina¬ 
tion  of  variables,  y | , a* , ,y  j  that  are  specified  by  the 
program  user  as  input.  For  each  such  regression  the 
following  data  are  printed t 

a)  regression  number 

b)  qjH  variate  normal,  where  q  is  the  number  of  variables 
i n  reg  ress i on 

c)  multiple  correlation  coefficient  (squared) 

d)  conditional  variance  of  y  -  Q~ 

P  pp, l,..o,q 

e)  the  regression  coef f i ci ents,  J3 -  (optional) 

5,  Print  out  input  data  regarding  samples  of  the  specified 
distribution  as  described  and  illustrated  in  chapter  VI 1 1 
i,e,,  numbers  of  samples,  observation  costs,  whether 
"standard"  or  "cost"  option  is  used,  etc. 

The  following  operations  are  performed  on  each  sample  specified 

6,  Generate  the  requi red  sample  of  the  specified  p  variate 
norma  I , 
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7.  Compute  samples  mean  vector  Z,  and  V-C  matrix*,  Sc  Print 
out  Z  and  S. 

8»  Test  sample  means,  Z  (optional)  (see  Appendix  B)0  Print 
out  eigenvectors  and  eigenvalues  of  matrix,  S,  from  which 
the  proper  statistic  is  computed.  Aiso  print  out  the 
statistic  and  the  proper  degrees  of  freedom  of  F  to  be 
used  for  comparison. 

9«  Test  sample  matrix,  S  (optional)  (see  Appendix  B). 

Print  out  eigenvalues  of  sample  matrix,  S.  Print  out 
the  statistic  to  be  compared  with  chi~squared  distri~ 
bution.  Also  print  out  proper  degrees  of  freedom  to  be 
used  for  comparison. 

Of  course,  the  user  of  program  MV  SIM  can  omit  some  of  the  above 
items  such  as  items  3  and  h  at  his  discretion. 

The  actual  analysis  of  each  sample  and  associated  printed  output 
performed  by  MV  REGRESSION  is  described  and  illustrated  in  detail  in 
chapter  IX. 
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Appendix  B 


TESTS  OF  SAMPLE  MEAN  VECTOR,  Z, 

AND  SAMPLE  VARIANCE-COVARIANCE  MATRIX,  S 


For  a  discussion  of  some  of  the  problems  encountered  in  generating 
random  numbers  by  arithmetic  means,  see  Barron  and  Vaa  ^9j  • 
Graybill  £l;j,  page  206,  shows  that  if  Y  is  a  p  variate  normal 
with  mean  vector  U  and  V-C  matrix  X]  »  then  the  quantity?: 

v  ■  (Z  -  U)T  S  1  (Z  -  U)  (n  -  p)  /  p  (  n  -  I), 


is  distributed  as  F,  N,  if  indeed  Z  and  S  are  computed  from  a 

(p,n-p) 

sample  of  size  n  from  the  specified  p  variate  normal,,  Hence,  to  test 
a  sample  mean  vector,  Z,  an  appropriate  level,  CX,  (usually  ,05)  is 
chosen.  Then  if  v  is  less  than  Fq,^  n  p)»  vector  Z  is  accepted  as 
having  been  computed  from  a  reasonable  sample;  otherwise  Z  is  rejected, 

To  perform  a  test  for  a  sample  V-C  matrix,  S,  an  orthogonal 
transformation  is  performed  on  both  X!  and  S,  separately,  yielding 
diagonal  matrices  A  and  D  respectively,  A  is  a  V-C  matrix  of  a 
p  variate  normal  with  independent  variables  (i,e,,  all  covariances 
are  equal  to  zero).  Now,  if  it  is  true  that  S  is  computed  from  a 
sample  drawn  from  a  p  variate  normal  with  V-C  matrix,  X]  »  then  D 
should  be  a  sample  drawn  from  a  p  variate  normal  with  V-C  matrix.  A, 
Hence,  a  test  that  D  is  a  sample  from  A  should  verify  that  S  is  a 
sample  f  rom  X]  • 

Since  each  element  of  D,  s/.  0  =  l,o,,,p),  is  a  sample  variance, 
and  since  each  element  of  A,  C-.pis  the  true  variance  corresponding 
to  element  s/.,  for  all  I,  intuitively,  it  appears  that  each  of  the 
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statisticsr 


(sj .  /  C'j  j)  °  (n  »  I )  5  *  l,...,p  t 

should  have  the  chi-square  distribution  wi th  n  -  I  degrees  of  freedom 

(n  is  still  the  sample  size).  From  this,  and  the  fact  that  s^j  is 

statistically  independent  of  s  i.  for  all  i,  j  -  S,...,p  (i  P  j) ,  it 

J  J 

follows  that  the  statistics 

P 

(Bl)  (n  -  I)  L  (s'  /  C-') 

1*1  11  '' 

has  the  chi-square  distribution  with  p°(n-l)  degrees  of  freedom, 
since  the  degrees  of  freedom  of  sums  of  independent  chi-squares  are 
addi ti ve. 

Hence,  to  test  each  sample  V-C  matrix,  S,  MV  SIM  "rotates” 
and  S,  and  computes  formula  Bl  above  from  A  and  D.  Printed  out 
(optionally)  are  the  p  diagonal  elements  of  A  and  D  (the  eigenvalues 
of  matrices  £  and  S  respectively).  Also  printed  are  the  result  of 
formula  Bl  and  the  number  of  degrees  of  freedom  of  the  chi-square 
distribution  to  be  used  for  comparison. 

Programs  MV  SIM  and  MV  REGRESSION  were  used  to  generate  and  test 
a  number  of  samples  from  two  different  p  variate  normals.  One  of 
these  normals  is  specified  by  4.  I  and  4.2  (five  variate  normal).  The 
other  distribution  was  an  18  variate  normal  that  was  very  close  to 
being  singular.  (Several  sets  of  rows  were  close  to  each  other  in 
value) . 

Six  sample  sizes;  5 0 ,  100,  150,  200,  250,  and  300  were  studied 
of  the  five  variate  normal,  with  20  samples  tested  of  each  size. 
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Four  sample  sizes:  $0,  100,  150,  and  200  were  studied  of  the 
18  variate  normal,  with  20  samples  tested  of  each  size. 

For  the  five  variate  normal,  both  statistics  (for  Z  and  S) 
appeared  to  behave  as  samples  from  their  respective  F  and  chi-squared 
distributions  for  all  sample  sizes. 

However,  curious  results  were  obtained  from  the  unusual  18 
variate  normal  tested.  All  Z  tests  passed  as  nicely  as  for  the  five 
variate  normal.  However,  the  values  of  chi-square  were  much  too 
high,  indicating  poor  sample  V-C  matrices,  S,  were  being  generated. 
For  example,  for  the  20  samples  of  size  100  (of  the  18  variate  normal) 
the  statistic  Bl  should  behave  as  chi-square  with  1782  degrees  of 
freedom  (which  is  the  mean  of  that  distribution).  The  20  computed 
values  of  Bl  ranged  from  2213  to  2683. 

A  possible  reason  for  these  poor  results  could  be  due  to  the 
use  of  a  poor  random  number  generator.  However,  the  satisfactory 
results  obtained  from  testing  the  five  variate  normal,  as  well  as 
tests  of  the  uniform  random  number  generator  conducted  previously 
leads  one  to  seek  a  different  source  of  error. 

Possibly  a  more  reasonable  explanation  is  the  likelihood  of 
computer  round  off  error.  The  large  number  of  computations  required 
to  rotate  an  18  x  18  matrix  plus  the  fact  that  the  matrices  were  all 
nearly  singular  could  very  likely  cause  this  type  error.  If  this  is 
the  case,  the  generated  sample  V-C  matrices  themselves  may  be  "good" 
samples  that  are  merely  difficult  to  test. 

Another  interesting  possibility  is  the  method  used  to  rotate 
matrix  S  for  the  test.  Recall  that  rotating  a  symmetric  matrix,  £  t> 
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to  yield  a  diagonal  matrix,  LA,  can  always  be  done  by  finding  an 
orthogonal  matrix,  R|,  so  that  the  following  is  satisfied? 


(B2) 


Rr  £•  RI  -  A- 


Also  since  S  is  also  symmetric  R2  can  be  found  so  that 

rJ  '  S  ♦  R2  «  D  , 

where  D  is  diagonal.  Since  £  and  S  are  not  exactly  equal  if  follows 
that  orthogonal  matrices  Rj  and  R^  will  not  be  equal 0 

Perhaps  one  might  argue  that  a  "better"  test  might  be  to  find 
R|  from  the  rotation  of  Y,  ,  B2  above,  and  then  computes 

T  / 

R, *  S  ♦  R,  -  D 

where  should  be  nearly  diagonal  if  S  Is  a  reasonable  sample  from 
!C  j  then  compare  the  diagonal  elements  of  and  A  as  described  above 
for  D  and  A  . 
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